Wei Shi - Projects

Wei Shi - Projects


Mapping of next-generation sequencing (NGS) reads

Mapping NGS reads to a reference genome is often the first step in many genomic applications that make use of NGS technologies. We have developed a ‘seed-and-vote’ read mapping paradigm to greatly speed up the read mapping process and improve mapping accuracy.

We have released a generic read aligner called Subread and an RNA-seq specific aligner called Subjunc.

Sequencing reads are increasingly becoming longer with the evolution of NGS technologies, and long reads are particularly useful for detecting full transcripts and structural variants. Our ‘seed-and-vote’ strategy is highly scalable and can be readily extended for the mapping of long reads.

Quantifying abundances of genomic features

After reads are mapped to a reference genome, they need to be assigned to genomic features to produce read counts. This enables downstream analyses, for expample gene expression analysis and histone modification analysis.

The genomic features may include genes, transcripts, exons and promoters.

Assigning reads to genomic features is known to be a highly computing-intensive operation. We develop a hierarchical search algorithm to quickly locate features that overlap with mapped reads and then count reads for each feature. We implement this algorithm in a software program called featureCounts, which is currently one of the most popular read counting tools used in the field. 

Detection of genomic mutations in cancer genomes

It is known that genomic mutations such as structural variants, short indels and SNPs can cause cancer and other diseases. Next-generation sequencing (NGS) technologies enable these mutations to be detected at a single-nucleotide resolution.

We aim to improve detection of structural variants and short indels by discovering such events within the read mapping process, instead of doing so after read mapping is completed.

Subread and Subjunc aligners support detection of these genomic events during read mapping. We also develop a context-based SNP caller called ExactSNP.

Using a genomic approach to study lymphocyte differentiation in adaptive immune system

In collaboration with immunology laboratories (Nutt, Kallies, Corcoran, Belz, Huntington and Hodgkin) in the institute, we use genomic sequencing technologies to profile global gene expression changes at different stages of lymphocyte differentiation.

We are interested in deciphering gene regulatory networks controlling differentiation of B cells, natural killer cells and regulatory T cells. We have successfully discovered a signature for antibody-secreting plasma cells, the end-point in B-cell lineage.

We are also interested in discovering genes and genomic mutations implicated in immune diseases such as lupus and lymphoma