FLAIR modules

flair.py is a wrapper script with modules for running various processing scripts located in bin/. Modules are assumed to be run in order (align, correct, collapse), but the user can forgo the wrapper if a more custom build is desired.

flair align

Aligns reads to the genome using minimap2, and converts the aligned minimap2 sam output to BED12 and optionally PSL. Aligned reads in psl format can be visualized in IGV or the UCSC Genome browser. As for which human reference genome to use, Heng Li has written a blog post on this.

Alternatively, the user can align the reads themselves with their aligner of choice and convert sorted bam output to bed12 using bin/bam2Bed12.py to supply for flair-correct. This step smooths gaps in the alignment.

Usage:

python flair.py align -g genome.fa -r <reads.fq.gz>|<reads.fq>|<reads.fa> [options]

run with --help for a description of optional arguments. Outputs (1) sam of raw aligned reads and (2) smoothed bed12 file of aligned reads to be supplied to flair-correct.

flair correct

Corrects misaligned splice sites using genome annotations and/or short-read splice junctions. Based on common user issues we have encountered, for flair-correct to run properly, please ensure/note that (1) the genome annotation and genome sequences are compatible, (2) gtf is preferred over gff for annotation and annotations that do not split single exons into multiple entries are ideal, (3) Bedtools is in your $PATH, and (4) kerneltree is properly installed (you may need to install Cython first). You may also want to refer to the installation requirements and/or use the conda environment for flair.

Usage:

python flair.py correct -q query.bed12 -g genome.fa [options]

run with --help for description of optional arguments. Outputs (1) bed12 of corrected reads, (2) bed12 of reads that weren’t able to be corrected, (3) psl of corrected reads if the -c chromsizes file is provided. Either (1) or (3) can be supplied to flair-collapse as the query.

Short-read junctions

To use short-read splice sites to aid with correction, one option is bin/junctions_from_sam.py to extract splice junctions from short-read alignments. The -s option accepts either sam or bam files, and if there are multiple sams/bams they can be provided in a comma-separated list.

Usage:

python junctions_from_sam.py -s <shortreads.sam>|<shortreads.bam> -n outname

The file that can be supplied to flair-correct with -j is in the output file outname_junctions.bed. It is recommended that the user remove infrequently used junctions i.e. junctions with few supporting junction reads, which are in the 5th column of the junction bed file. For example, if you wanted to do the filter out junctions with fewer than 3 supporting short reads, you could use awk '{ if ($5 > 2) { print } }' outname.sj_junctions.bed > outname.filtered.bed.

Alternatively, the -j argument for flair-correct can also be generated using STAR. STAR 2-pass alignment of short reads automatically produces a compatible splice junction file (SJ.out.tab). We recommend filtering out junctions with few uniquely mapping reads (column 7).

flair collapse

Defines high-confidence isoforms from corrected reads. As FLAIR does not use annotations to collapse isoforms, FLAIR will pick the name of a read that shares the same splice junction chain as the isoform to be the isoform name. It is recommended to still provide an annotation with -f, which is used to rename FLAIR isoforms that match isoforms in existing annotation according to the transcript_id field in the gtf. Intermediate files generated by this step are removed by default, but can be retained for debugging purposes by supplying the argument --keep_intermediate and optionally supplying a directory to keep those files with --temp_dir.

If there are multiple samples to be compared, the flair-corrected read bed or psl files should be concatenated prior to running flair-collapse. In addition, all raw read fastq/fasta files should either be specified after -r with space/comma separators or concatenated into a single file.

Usage:

python flair.py collapse -g genome.fa -r <reads.fq>|<reads.fa> -q <query.psl>|<query.bed> [options]

run with --help for description of optional arguments.

Outputs the high-confidence isoforms in several formats: (1) *isoforms.psl or .bed, (2) *isoforms.gtf, as well as (3) an *isoforms.fa file of isoform sequences. If an annotation file is provided, the isoforms ID format will contain the transcript id, underscore, and then the gene id, so it would look like ENST*_ENSG* if you’re working with the GENCODE human annotation. If multiple TSSs/TESs are allowed (toggle with --max_ends or --no_redundant), then a -1 or higher will be appended to the end of the isoform name for the isoforms that have identical splice junction chains and differ only by their TSS/TES. For the gene field, the gene that is assigned to the isoform is based on whichever annotated gene has the greatest number of splice junctions shared with the isoform. If there are no genes in the annotation which can be assigned to the isoform, a genomic coordinate is used (e.g. chr*:100000).

flair quantify

Convenience function to quantifying FLAIR isoform usage across samples using minimap2. If isoform quantification in TPM is desired, please use the --tpm option. If the user prefers salmon to quantify transcripts using their nanopore reads, please specify a path to salmon using --salmon. For all options run flair-quantify with --help.

Usage:

python flair.py quantify -r reads_manifest.tsv -i isoforms.fasta [options]

Inputs:

1. reads_manifest.tsv is a tab-delimited file containing the sample name, condition, batch*, and path to reads.fq/fa. For example:

sample1 conditionA  batch1  ./sample1_reads.fq
sample2 conditionA  batch1  ./sample2_reads.fq
sample3 conditionA  batch2  ./sample3_reads.fq
sample4 conditionB  batch1  ./sample4_reads.fq
sample5 conditionB  batch1  ./sample5_reads.fq
sample6 conditionB  batch2  ./sample6_reads.fq

* The batch descriptor is used in the downstream flair-diffExp analysis to model unintended variability due to secondary factors such as batch or sequencing replicate. If unsure about this option, leave this column defined as batch1 for all samples.

2. isoforms.fasta contains FLAIR collapsed isoforms produced by the flair collapse module.

Outputs: (1) counts_matrix.tsv which is a tab-delimited file containing isoform counts for each sample. In the output, the values in the manifest file are concatenated with underscores so please do not use underscores in the manifest file. For example:

ids samp1_conditionA_batch1 samp2_conditionA_batch1 samp3_conditionA_batch2 ...
0042c9e7-b993_ENSG00000131368.3 237.0   156.0   165.0   150.0   ...
0042d216-6b08_ENSG00000101940.13    32.0    14.0    25.0    ...

flair diffExp

Performs differential isoform expression, differential gene expression, and differential isoform usage analyses between multiple samples with 3 or more replicates. For differential isoform usage analysis between samples without replicates, please use the diff_iso_usage.py standalone script. This module requires additional python modules and R packages which are described below:

Additional Requirements

  1. python v2.7+ and python modules: pandas, numpy, rpy2

  2. DESeq2

  3. ggplot2

  4. qqman

  5. DRIMSeq

  6. stageR

Usage:

python flair.py diffExp -q counts_matrix.tsv -o output_directory [options]

Inputs: (1) counts_matrix.tsv is a tab-delimited file generated by the flair quantify module.

Outputs: (1) Files contained in the output_directory are tables and plots generated from the various R-packages used in this analysis, including raw DESeq2/DRIMSeq output tables with foldChange, isoform frequency and adjusted pvalues.

flair diffSplice

Calls alternative splicing events from isoforms. Currently we support the following AS events: intron retention, alternative 3’ splicing, alternative 5’ splicing, and cassette exons.

Usage:

python flair.py diffSplice -i <isoforms.bed>|<isoforms.psl> -q counts_matrix.tsv [options]

If there are 3 or more samples per condition, then the user can run with --test and DRIMSeq will be used to calculate differential usage of the alternative splicing events between two conditions. Run with --help to see more DRIMSeq-specific arguments. If conditions were sequenced without replicates, then the flair-diffsplice output files can be input to the diffsplice_fishers_exact.py script for statistical testing instead.

Inputs: (1) -i is a tab-delimited isoforms.bed/psl file generated by the `flair-collapse flair collapse module. (2) -q is a tab-delimited counts_matrix.tsv file generated by the flair quantify module.

Outputs: (1-4) 4 tab-delimited files for each AS event type. If DRIMSeq was run, and a file with PSIs for each sample and the corresponding p-values for each event type (5-8).

For a complex splicing example, please note the 2 alternative 3’ SS, 3 intron retention, and 4 exon skipping events in the following set of isoforms that flair-diffSplice would call and the isoforms that are considered to include or exclude the each event:

_images/toy_isoforms_coord.png
a3ss_feature_id     coordinate                  sample1 sample2 ... isoform_ids
inclusion_chr1:80   chr1:80-400_chr1:80-450     75.0    35.0    ... a,e
exclusion_chr1:80   chr1:80-400_chr1:80-450     3.0     13.0    ... c
inclusion_chr1:500  chr1:500-650_chr1:500-700   4.0     18.0    ... d
exclusion_chr1:500  chr1:500-650_chr1:500-700   70.0    17.0    ... e
ir_feature_id           coordinate      sample1 sample2 ... isoform_ids
inclusion_chr1:500-650  chr1:500-650    46.0    13.0    ... g
exclusion_chr1:500-650  chr1:500-650    4.0     18.0    ... d
inclusion_chr1:500-700  chr1:500-700    46.0    13.0    ... g
exclusion_chr1:500-700  chr1:500-700    70.0    17.0    ... e
inclusion_chr1:250-450  chr1:250-450    50.0    31.0    ... d,g
exclusion_chr1:250-450  chr1:250-450    80.0    17.0    ... b
es_feature_id           coordinate      sample1 sample2 ... isoform_ids
inclusion_chr1:450-500  chr1:450-500    83.0    30.0    ... b,c
exclusion_chr1:450-500  chr1:450-500    56.0    15.0    ... f
inclusion_chr1:200-250  chr1:200-250    80.0    17.0    ... b
exclusion_chr1:200-250  chr1:200-250    3.0     13.0    ... c
inclusion_chr1:200-500  chr1:200-500    4.0     18.0    ... d
exclusion_chr1:200-500  chr1:200-500    22.0    15.0    ... h
inclusion_chr1:400-500  chr1:400-500    75.0    35.0    ... e,a
exclusion_chr1:400-500  chr1:400-500    56.0    15.0    ... f