Our lab focus spans around statistical bioinformatics and gene expression.
Modern genomic technologies produce huge amounts of data that allow us to examine gene activity on a genome-wide scale. We can observe which genes are turned on and how active they are in any type of cell at any time.
My research group develops advanced computational and statistical strategies to analyse and interpret these huge data sets. In collaboration with other institute scientists, we examine which genes play essential roles in normal cell development and which genes are disrupted or activated inappropriately in any particular disease.
Our goal is to learn how diseases originate by examining how genetic disruption comes about and how it might be controlled.
Our Lab pioneered the innovative use of statistical methods such as linear models, empirical Bayes and generalized linear models for modelling gene expression data from high-throughput genomic technologies.
We created the limma, edgeR, Rsubread, csaw and diffHic software packages, which form a key part of the international Bioconductor project for genomic software. limma and edgeR are world’s most downloaded R software packages for the statistical analysis of RNA-seq and microarray data.
We have developed the limma and edgeR software packages that are widely used around the world for analyzing gene expression experiments. Among our current interests are differential expression at the transcript level and detection of differential splicing between experimental conditions.
We are currently extending both the limma and edgeR packages to handle data from scRNA-seq technologies. Among our current interests are the assessment of differential expression relative to biological variation.
Technical advancements in mass-spectrometry have made high-throughput quantitative proteomics an increasingly popular tool for medical researchers, but statistical tools for analysing proteomics data remain far less developed than those for microarrays or RNA-seq. The key complication preventing straightforward differential analyses of proteomics data is the frequent appearance of missing values arising from peptides that cannot be quantified in certain samples. We are developing a new approach for proteomics based on a probabilistic representation of the missing value mechanism.
We developed the diffHic methodology to evaluated changes in chromatin confirmation and enhancer interactions between experimental conditions. We continue to apply this methodology in experiments to understand auto-immune disease, breast cancer and immune cell differentiation. We integrate Hi-C with epigenome data.
We have developed a number of gene set test methods for assessing the behaviour of co-regulated sets of genes representing higher level biological processes, which we apply to analyses of stem cells, breast cancer and lung cancer and other disease. One of our priorities in the development of pre-ranked gene set enrichment methods that take account of inter-dependence between genes.
Our research is motivated and informed by collaborations with biomedical colleagues. Over the next 5 years, we will analyse, integrate and interpret data from a wide range of genomic technologies as part of collaborative projects on cancer, immunology and development. Continuing projects include stem cells and the genesis of breast cancer, chromatin conformation and auto-immune disease, identifying therapeutic targets for predominant antibody deficiencies, the role of the MYST gene family in development, defining genes that dictate different cellular outcomes in response to TP53 activation and profiling mechanisms of gene regulation during B cell differentiation.
Our team includes graduates from a range of disciplines including statistics, mathematics, computer science, genetics and engineering.
Applications are welcome from students with a strong background in any of these areas who are motivated to work on biomedical problems.