NEXTflex™ RNA Fragmentation Buffer
NEXTflex™ First Strand Synthesis Primer
NEXTflex™ Directional First Strand Synthesis Buffer Mix
NEXTflex™ Rapid Reverse Transcriptase
NEXTflex™ Directional Second Strand Synthesis Mix
NEXTflex™ Adenylation Mix
NEXTflex™ Ligation Mix
NEXTflex™ Molecular Index Adapters (1 µM)
NEXTflex™ Uracil DNA Glycosylase
NEXTflex™ qRNA-Seq™ Universal Forward Primer (10 µM)
NEXTflex™ PCR Master Mix
NEXTflex™ qRNA-Seq™ Barcoded Primers (10 µM)
Required Materials not Provided
10 ng - 100 ng mRNA or rRNA-depleted RNA or 10 ng – 10 µg total RNA
100% Ethanol (stored at room temperature)
80% Ethanol (freshly prepared
2, 10, 20, 200 and 1000 µL pipettes
RNase-free pipette tips
Nuclease-free 1.5 mL microcentrifuge tubes
Thin wall nuclease-free 0.5 mL microcentrifuge tubes
96 well PCR Plate Non-skirted (Phenix Research, Cat # MPS-499) / or / similar
Adhesive PCR Plate Seal (BioRad, Cat # MSB1001)
Agencourt AMPure XP 60 mL (Beckman Coulter Genomics, Cat # A63881)
Magnetic Stand -96 (Ambion, Cat # AM10027) / or / similar for post PCR cleanup
Simplified Demultiplexing of Molecular Indexed Reads
NGS library preparation typically includes a PCR amplification step. While this step allows one to obtain sufficient material for sequencing, it also distorts the quantitative representation of sequences in the library through the disproportionate amplification of some products. Moreover, PCR amplification introduces technical errors in the sequences that become indistinguishable from biological variants in sample DNA or RNA, introducing bias into the downstream diagnostic analysis.
To overcome PCR bias, the USS method was introduced to eliminate duplicates based on the identification of their unique starts and stops. This method works based on the assumption that the initial chemical or mechanical fragmentation of DNA or RNA is random, and therefore all fragments generated in this way would differ in their start and/or stop sites. Consequently, only PCR duplicates will have identical start/stop sites can be identified and removed. However, detailed analyses revealed that many original fragments have identical start and stop sites (1). As a result, the application of the USS method alone incorrectly eliminates unique fragments from NGS data. Additionally, these methods cannot be applied to amplicon-seq applications, as all fragments obtained from one target have the same start and stop sites. Similarly, applying this approach to other enrichment-based protocols, such as ChIP-Seq and CLIP-Seq, where the number of fragments with identical ends is high even before the PCR amplification, is also problematic.
Stochastic labeling (tagging with a unique molecular identifier) has been used by a number of researchers to solve this problem and improve the representativeness of NGS analysis, (2-5). Nucleic acid fragments can be indexed by the ligation of labeled Y-shaped adaptors, Molecular Indexes™ (for RNA-Seq and ChIP-Seq applications) or by using a labeled RT primer (for CLIP-Seq applications). These Molecular Indexes can be applied either in combination with the USS method or alone, to eliminate duplicates and to correctly detect somatic mutations (6-7).
Bioo Scientific offers two kits – the NEXTflex™ qRNA-Seq™ Kit V2 and the NEXTflex™ Rapid Directional qRNA-Seq™ Kit – for the construction of Molecularly Indexed libraries. The principle behind these protocols is the random introduction of 96 molecular labels at both ends of each cDNA fragment prior to PCR amplification, allowing for 9216 unique combinations of labels. The chances of a fragment of DNA having both the same start/stop coordinates and the same combination of molecular indices is exceedingly small, thus molecular indexing can be used in combination with the USS method to distinguish identical but distinct starting molecules from true PCR duplicates. This allows retention of reads that would otherwise be discarded as PCR duplicates, allowing more accurate quantitative analysis. The percentage of fragments with the same stop and start retained by application of molecular indexing depends on the starting material, depth of sequencing, and method of library preparation.
Bioo Scientific now offers a complementary qRNASeq script, with a General Public License (GPL), created by Weihong Xu from the Stanford Genome Technology Center. The script, along with instructions and a .txt file, can be downloaded from the Resources tab on the NEXTflex qRNA-Seq Kit v2 and the NEXTflex Directional qRNA-Seq Kits product pages. Using read pairs aligned to transcripts and Fastq files, this script will generate a table listing fragments, their start/stop sites in transcripts, and their molecular labels (also known as stochastic labels, or STLs). The script will also generate a table listing total number of read pairs per transcript and number of read pairs after STL, USS, and STL/USS correction.
In one specific example, using the qRNASeq script we demonstrate that about 50% of RNA-Seq read pairs eliminated based on USS alone were improperly removed (Fig. 1). A qRNA library prepared from RNA from the MCF-7 cell line was pair-end sequenced by Illumina HiSeq. Analysis by USS demonstrated that 20% of properly aligned fragments had identical start and stop sites (nUSS) and would be eliminated from library. However, additional analysis of molecular indices revealed that half of these fragments had different combinations of molecular indices (nSTL + nUSS) and therefore should be retained. This method of RNA-Seq data analysis requires using stochastic labels in combination with start/stop methods because application of molecular indices alone (nSTL) does not provide enough combinations for some highly expressed genes.
Figure 1. The number of unique fragments as determined by nSTL (molecular indexes), nUSS and combination of these two methods (nSTL + nUSS). The number of read pairs properly aligned and analyzed in this experiment were 500,849. Unique read pairs identified by nSTL were 364,215, by nUSS 399,510 and nSTL+nUSS 455,391.
In conclusion, the introduction of new software for analysis of molecular indexed data produced by the Bioo Scientific NEXTflex qRNA-Seq Kit V2 and the NEXTflex Rapid Directional qRNA-Seq Kit provides a convenient and robust package for accurate analysis of gene expression.
1. Poptsova, M. S., Il’icheva, I. A., Nechipurenko, D. Yu., Panchenko, L. A., Khodikov, M. V., Oparina, N. Y.,Polozov, R. V., Nechipurenko, Yu. D. & Grokhovsky, S. L. (2013). Non-random DNA fragmentation in next-generation sequencing. Scientific Reports 4:4532.
2. Casbon, J. A., Osborne, R. J., Brenner, S., and Lichtenstein, C. P. (2011). A method for counting PCR template molecules with application to next-generation sequencing. Nucleic Acids Res. 39:e81.
3. Jabara, C. B., Jones, C. D., Roach, J., Anderson, J. A., and Swanstrom, R. (2011). Accurate sampling and deep sequencing of the HIV-1 protease gene using a Primer ID. Proc. Natl. Acad. Sci. USA 96:20166-20171.
4. Kivioja, T., Vähärautio, A., Karlsson, K., Bonke, M., Enge, M., Linnarsson, S. & Taipale, J. (2012). Counting absolute numbers of molecules using unique molecular identifiers. Nature Methods. 9:72.
5. Fu, G. K., Xu, W., Wilhelmy, J., Mind, M. N., and Fodor, S. P. A. (2014). Molecular indexing enables quantitative targeted RNA sequencing and reveals poor efficiencies in standard library preparations. Proc. Natl. Acad. Sci. USA 111:1891-6.
6. Schmitt, M. W., Kennedy, S. R., Salk, J. J., Fox, E. J., Hiatt, J. B., and Loeb, L. A. (2012). Detection of ultra-rare mutations by next-generation sequencing. Proc. Natl. Acad. Sci. USA 109:14508-14513.
7. Kinde, I., Wu, J., Papadopoulos, N., Kinzler, K. W., and Vogelstein, B. (2011). Detection and quantification of rare mutations with massively parallel sequencing. Proc. Natl. Acad. Sci. USA 108:9530-9535