Long read transcriptomics data enables estimation of gene and isoform level expression with higher accuracy and confidence than ever before. Further, Oxford Nanopore Technologies (ONT) allow the direct sequencing of RNA without Reverse Transcription (RT), and also the sequencing of cDNA without PCR amplification.
Previously, RNA sequencing required both Reverse Transcription (RT), and PCR amplification. PCR amplification is known to induce biases associated with expression levels, transcript lengths and GC content. Biases introduced by the RT step are not as well described or quantified.
In order to gain further insights into how different aspects of RNA preparation bias the data, we compared ONT long read RNA sequencing of direct RNA, direct cDNA and PCR amplified cDNA using five cell-line data sets from SG-NEx. Across all the cell-lines we identified hundreds of Differentially Expressed (DE) genes and isoforms, indicating a significant change in the distribution of counts between preparation protocols. Furthermore, Differential Transcript Usage (DTU) was identified between protocols in hundreds of genes, indicating significant changes in the measured isoform proportions.
Next, we focused on genes significant for DTU between dRNA and cDNA samples with a view to understand biases that may occur in the structure of transcripts rather than just measured expression. We devised an approach to identify and characterise the structural differences between pairs of isoforms (such as skipped exon, intron retention, extended exon etc.). When we compared the distribution of structural differences identified between switching isoforms to that found between pairs of isoforms selected at random from non-significant DTU genes (background), we observed clear differences in isoform features. Some of these differences occur at the transcript ends and could be associated with RNA fragmentation. However within the transcript body, the isoform switching pairs contain significantly more inserted introns, and less skipped exons in cDNA compared with the background pairs. This highlights a consistent pattern found across all the cell-lines tested, that could indicate the introduction of biases that are prevalent within the transcript body, possibly due to the Reverse Transcription step.
In experimental settings, these approaches can improve the interpretation of potential functional differences between transcripts in DTU genes. Detected structural differences can be aggregated and compared between experimental conditions to help identify specific mechanisms involved in isoform switching.
Dr Rotem Aharon holds a PhD in Mathematics and Statistics and a MSc in Applied Mathematics. She has extensive industry experience applying mathematics, statistics, machine learning, and deep learning principles to advance data analysis methodologies and design monitoring functions for computational models in production. Rotem currently works at Peter MacCallum Cancer Centre as a member of the Oshlack lab in the Computational Biology program, where her research focuses on developing novel analysis approaches and collaborative projects involving long-read transcriptomics data.