FusionHunter: identifying fusion transcripts using paired-end RNA-seq



Author and Code License

Prerequisites

Update

Source Code

Codes including:

Annotation package

Install

Note from authors

Inputs (highlighted are mandatory)

L
Left(first) part of paired-end reads(fastq format). Should be named as XXXX/1.
R
Right(second) part of paired-end reads(fastq format). Should be named as XXXX/2.
Reference
The directory containing fasta format reference genome. reference for various species can be downloaded from UCSC Genome Browser. When you have downloaded .fa files for chromosomes, you should merge them together e.g. for assembly hg18, use command : cat chr*.fa > hg18.fa. Only 'major' chromosomes shall be included in reference genome. Undetermined scaffolds must be excluded since they might lead to spurious fusion outputs.
BowtieIdx
The directory to Bowtie index / base name of Bowtie index. NO '/' in the end, e.g. for hg18, the Bowtie index would be hg18.1.ebwt, hg18.2.ebwt, hg18.3.ebwt, hg18.4.ebwt, hg18.rev.1.ebwt, hg18.rev.2.ebwt, so the base name of Bowtie index is 'hg18', thus BowtieIdx = DirtoBowtieIndex/hg18.
Gene_annotation
Directory and name of gene annotation list, we suggest UCSC annotation. We provide hg18 Gene_annotation in our annotation package (file name hg18.ucscKnownGene). For species other than human, users can download gene annotation file from UCSC table browser, with first 10 columns as the GenePred table format, and last column should be the gene name.
Repeats
Directory and file name repeats region annotation. We provide Repeats annotation for hg18 in our annotation package (file name hg18.repeats). OPTIONAL in other species. Leave as blank if not available for your data.
SelfAlign
Directory and file name of self-alignment regions.We provide SelfAlign annotation for hg18 in our annotation package (file name hg18.chain.pairs). OPTIONAL in other species. Leave as blank if not available for your data.
EST
Directory and file name of human EST database, We provide SelfAlign annotation for hg18 in our annotation package (file name hg18.SpliceEST). OPTIONAL in other species. Leave as blank if not available for your data.

Outputs

fusion_output
Output of fusion by FusionHunter. Default is FusionHunter.fusion
readthrough_output
Output of readthrough by FusionHunter. Default is FusionHunter.readthrough


BASIC OPTIONS (highlighted are mandatory)

IF_HUMAN
Set 1 if running on hg18/hg19; otherwise 0.
CORE
Number of cores used for bowtie alignment, since the most time consuminng process in FusionHunter is Bowtie alignment, we suggest you use as many cores as possible
segment_size
Size of the partial reads, should not be longer than half of full read length, and we strongly suggest you use half length, e.g. 25 if RNA-seq reads are 50bps.
PAIRNUM
Min number of paired-end reads that support a fusion (encompassing a fusion junction). Default is 2.
MINSPAN
Min number of junction spanning reads to support a fusion. Default is 1. TO BE NOTED: if you set the MINSPAN = 1, in order to reduce false positives, any candidate junction supported by only 1 spanning read would be discarded UNLESS the fusion junction point is exactly on annotated exon boundary. This process is embeded in FusionHunter.

ADVANCED OPTIONS

MASK
If set to 1, repetitive regions in reference genome will be filtered out when performing gapped alignment. If set to 0, no filtering would be done on reference genome. Default is 1.
TILE
Size of exact match for each junction flanking tile. Default is 4.
MINOVLP
Min size of the maximum base coverage on either side of the junction. Default is 8.
REAPTOVLP
max allowed repeat proportion of a read (used in reduceBwt). Default is 0.6.
CHAINNUM
number of chains to overlap with a read (used in reduceBwt). Default is 20.
RPTOVLP
max allowed repeat proportion of a read (more stringent, used in leftRightOvlp) . Default is 0.2.
CHAINOVP
max allowed alignment proportion between a pair of reads (used in leftRightOvlp) . Default is 0.2.
CHAINDIS
distance to self-chain boundary (used in postLeftRightOvlp) . Default is 200000.
READOVLP
proportion of a overlaps with a region (used in regionPairs) . Default is 0.8.



Reference

Specificity and Sensitivity

To increase specificity (get reliable results), set the variables in highlighted box (PAIRNUM, MINSPAN, TILE, MINOVLP) larger; to increase sensitivity(get more results), set the variables in highlighted box smaller.

How to run FusionHunter

Sample

We use paired-end reads collected from K-562-4 sample (Test sample extracted from SRX006134 in SRA of NCBI)

Bug report

Email: yangli9 AT illinois.edu

Contact

Yang Li and Jian Ma