How can I demultiplex IsoSeq data?
Even if you only want to remove IsoSeq primers, lima is the tool of choice.
- Remove all duplicate sequences.
- Annotate sequence names with a
5p
or3p
suffix. Example:>primer_5p AAGCAGTGGTATCAACGCAGAGTACATGGGG >sample_brain_3p AAGCAGTGGTATCAACGCAGAGTACCACATATCAGAGTGCG >sample_liver_3p AAGCAGTGGTATCAACGCAGAGTACACACACAGACTGTGAG
- Use the
--isoseq
mode. Run in combination with--peek-guess
to remove spurious false positive. - Output will be only different pairs with a
5p
and3p
combination:demux.primer_5p--sample_brain_3p.bam demux.primer_5p--sample_liver_3p.bam
Those options are very conservative to remove any spurious and ambiguous calls, in order to guarantee that only proper asymmetric (barcoded) primer are used in downstream analyses. Good libraries reach >75% CCS reads passing lima filters.