Merging assemblies with cuffmerge

Cufflinks includes a script called cuffmerge that you can use to merge together several Cufflinks assemblies. It handles also handles running Cuffcompare for you, and automatically filters a number of transfrags that are probably artfifacts. If you have a reference GTF file available, you can provide it to the script in order to gracefully merge novel isoforms and known isoforms and maximize overall assembly quality. The main purpose of this script is to make it easier to make an assembly GTF file suitable for use with Cuffdiff. From the command line, run cuffmerge as follows:

cuffmerge [options]* <assembly_GTF_list.txt>

##Cuffmerge input files

cuffmerge takes several assembly GTF files from Cufflinks’ as input. Input GTF files must be specified in a “manifest” file listing full paths to the files.

###Cuffmerge arguments

<assembly_list.txt>

Text file “manifest” with a list (one per line) of GTF files that you’d like to merge together into a single GTF file.

###Cuffmerge options

-h/–help

Prints the help message and exits

-o <outprefix>

Write the summary stats into the text output file <outprefix>(instead of stdout)

-g/–ref-gtf

An optional “reference” annotation GTF. The input assemblies are merged together with the reference GTF and included in the final output.

-p/–num-threads <int>

Use this many threads to align reads. The default is 1.

-s/–ref-sequence <seq_dir>/<seq_fasta>

This argument should point to the genomic DNA sequences for the reference. If a directory, it should contain one fasta file per contig. If a multifasta file, all contigs should be present. The merge script will pass this option to cuffcompare, which will use the sequences to assist in classifying transfrags and excluding artifacts (e.g. repeats). For example, Cufflinks transcripts consisting mostly of lower-case bases are classified as repeats.

Note that <seq_dir> must contain one fasta file per reference chromosome, and each file must be named after the chromosome, and have a .fa or .fasta extension.

##Cuffmerge output files

Cuffmerge produces a GTF file, merged.gtf that merges together the input assemblies.