Performance is measured as positive predictive value (PPV); it measures TP/(TP+FP), the ratio of true positive calls over all true and false positive calls. It informs us how much cross-calling has been observed between the desired barcode pairs. It is also known as precision. In order to compute a PPV, distinct amplicons of known lengths and origin are barcoded, sequenced, demultiplexed, and mapped back to the set of known references. With this approach, true and false positive calls can be counted per barcode pair. The resulting PPV is due to misidentification by the demultiplexing algorithm, caused by many different external factors, such as poorly synthesized barcode molecules, contamination between barcode wells, and insert contamination during the library preparation.
Depending on the barcoding mode, same or different barcodes on the ends of the insert, and the number of barcodes used, PPV varies.
Examples for different barcoding schemes, (x) indicating use of a barcode pair:
8-plex same / symmetric
28-plex different / asymmetric
36-plex same+different / symmetric+asymmetric
Following libraries contain 2kb amplicons with vector-sequence-specific primers amplified. Sequencing movies are 6 hours long with additional 2 hours pre-extension. The instrument version is 5.0.0 and the chemistry is S/P2-C2. For each ZMW, all sequenced barcode regions were respected.
- With increasing number of barcodes, PPV decreases.
- Same barcode pair libraries have higher PPV than different barcode pair libraries.
- Mixing same and different barcode pairs in one library leads to very bad PPV and is not supported.
The yield is, after the PPV, the next most important metric. Lima removes unwanted barcode pairs that are undesired to increase PPV, accepting a decrease in yield.
Example 384-plex symmetric (look at the bars above the x-axis):
Compare it to a 384-plex asymmetric run:
The reason behind the yield decrease for asymmetric is, in order to identify a ZMW as asymmetric, both flanking barcodes of an insert have to be observed; ZMWs whose polymerase read does not contain at least two adapters have to be removed. In contrast, for the symmetric case, it is sufficient to see a single barcode region.
A 96-plex barcoded adapter library with 2kb insert, 30 hour movie.
ccs version 5.0.0,
lima version 2.0.0.
ZMWs input (A) : 2045937 ZMWs above all thresholds (B) : 1937358 (94.69%) ZMWs below any threshold (C) : 108579 (5.31%) ZMW marginals for (C): Below min length : 0 (0.00%) Below min score : 0 (0.00%) Below min end score : 0 (0.00%) Below min passes : 0 (0.00%) Below min score lead : 7651 (7.05%) Below min ref span : 292 (0.27%) Without SMRTbell adapter : 0 (0.00%) Undesired diff pairs : 104029 (95.81%)
Following the raw numbers for the PPV/Yield curve below. Yield percentage is w.r.t. the
1937358 ZMWs from above. The initial 0.44% yield loss is due to how we process the data for PPV analaysis, requiring at least a 600 bp mapped to the originating reference.
Without any filtering, PPV is at 99.96% and with the recommended
--min-score 80 PPV increases to 99.992% with an additional 2.2% yield loss.