We analyse data from a number of genomic technologies, especially RNA sequencing (RNA-seq), but also DNA sequencing, gene expression microarrays, protein arrays, mass spectrometry and high-throughput PCR arrays. One of our key interests is the identification of genes, transcripts or molecular pathways that are differentially expressed between experimental conditions. We also analyse ChIP sequencing experiments to detect changes in the DNA epigenetic marks and DNA structure.
We develop high performance algorithms to map short sequence reads to a reference genome. We use mathematical techniques such a linear modelling and empirical Bayes to borrow strength between genes and between experimental units, providing robust statistical conclusions even when the number of experimental units is relatively small.
We collaborate closely with institute scientists on a range of human diseases including breast cancer, lung cancer, multiple sclerosis and various immunological disorders.
Australia, University of Western Australia, BSc (Hons), 1978
Australia, Australian National University, PhD, 1986
2023, Julian Wells Medal, Lorne Genome Inc
2021, Fellow, Australian Academy of Science
2020, Honorary Senior Fellow, ABACBS
2020, Bioconductor Award, Bioconductor
2019, Open Science Award, ABACBS
2013-2022, Highly Cited Researcher, Web of Science
Ritchie, ME, Phipson, B, Wu, D, Hu, Y, Law, CW, Shi, W, and Smyth, GK (2015). limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research 43, e47. PMID: 25605792
Lun, ATL and Smyth, GK (2014). De novo detection of differentially bound regions for ChIP-seq data using peaks and windows: controlling error rates correctly. Nucleic Acids Research 42, e95. PMID: 24852250
Law, CW, Chen, Y, Shi, W, and Smyth, GK. Voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biology 2014. 15, R29. PMID: 24485249
Wu, D, Smyth, GK. Camera: a competitive gene set test accounting for inter-gene correlation. Nucleic Acids Research 2012. 40, e133. PMID: 22638577
McCarthy, DJ, Chen, Y, Smyth, GK. Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Research 2012. 40, 4288-4297. PMID: 22287627
Wu, D, Lim, E, François Vaillant, F, Asselin-Labat, M-L, Visvader, JE, and Smyth, GK. ROAST: rotation gene set tests for complex microarray experiments. Bioinformatics 2010. 26, 2176-2182. PMID: 20610611
Robinson, M, McCarthy, DJ, Smyth, GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 2010. 26, 139-140. PMID: 19910308
Robinson, MD, and Smyth, GK. Moderated statistical tests for assessing differences in tag abundance. Bioinformatics 2007. 23, 2881-2887. PMID: 17881408
Smyth, G. K., Michaud, J., and Scott, H. Use of within-array replicate spots for assessing differential expression in microarray experiments. Bioinformatics 2005. 21(9), 2067-2075. PMID: 15657102
Smyth, G. K. Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology 2004. 3, No. 1, Article 3. PMID: 16646809
We have developed the limma and edgeR software packages that are widely used around the world for analyzing gene expression experiments. Among our current interests are differential expression at the transcript level and detection of differential splicing between experimental conditions.
We are currently extending both the limma and edgeR packages to handle data from scRNA-seq technologies. Among our current interests are the assessment of differential expression relative to biological variation.
Technical advancements in mass-spectrometry have made high-throughput quantitative proteomics an increasingly popular tool for medical researchers, but statistical tools for analysing proteomics data remain far less developed than those for microarrays or RNA-seq. The key complication preventing straightforward differential analyses of proteomics data is the frequent appearance of missing values arising from peptides that cannot be quantified in certain samples. We are developing a new approach for proteomics based on a probabilistic representation of the missing value mechanism.
We developed the diffHic methodology to evaluated changes in chromatin confirmation and enhancer interactions between experimental conditions. We continue to apply this methodology in experiments to understand auto-immune disease, breast cancer and immune cell differentiation. We integrate Hi-C with epigenome data.
We have developed a number of gene set test methods for assessing the behaviour of co-regulated sets of genes representing higher level biological processes, which we apply to analyses of stem cells, breast cancer and lung cancer and other disease. One of our priorities in the development of pre-ranked gene set enrichment methods that take account of inter-dependence between genes.
Our research is motivated and informed by collaborations with biomedical colleagues. Over the next 5 years, we will analyse, integrate and interpret data from a wide range of genomic technologies as part of collaborative projects on cancer, immunology and development. Continuing projects include stem cells and the genesis of breast cancer, chromatin conformation and auto-immune disease, identifying therapeutic targets for predominant antibody deficiencies, the role of the MYST gene family in development, defining genes that dictate different cellular outcomes in response to TP53 activation and profiling mechanisms of gene regulation during B cell differentiation.