Our research is motivated and informed by collaborations with biomedical colleagues. Over the next 5 years, we will analyse, integrate and interpret data from a wide range of genomic technologies as part of collaborative projects on cancer, immunology and development. Continuing projects include stem cells and the genesis of breast cancer, chromatin conformation and auto-immune disease, identifying therapeutic targets for predominant antibody deficiencies, the role of the MYST gene family in development, defining genes that dictate different cellular outcomes in response to TP53 activation and profiling mechanisms of gene regulation during B cell differentiation.

Prof Gordon Smyth, Division Head | WEHI Researcher Profile

Q: Analysis of RNA sequencing data

We have developed the limma and edgeR software packages that are widely used around the world for analyzing gene expression experiments. Among our current interests are differential expression at the transcript level and detection of differential splicing between experimental conditions.

Q: Analysis of single cell RNA sequencing data

We are currently extending both the limma and edgeR packages to handle data from scRNA-seq technologies. Among our current interests are the assessment of differential expression relative to biological variation.

Q: Analysis of proteomics data

Technical advancements in mass-spectrometry have made high-throughput quantitative proteomics an increasingly popular tool for medical researchers, but statistical tools for analysing proteomics data remain far less developed than those for microarrays or RNA-seq. The key complication preventing straightforward differential analyses of proteomics data is the frequent appearance of missing values arising from peptides that cannot be quantified in certain samples. We are developing a new approach for proteomics based on a probabilistic representation of the missing value mechanism.

Q: Analysis of Hi-C data, chromatin interactions and epigenetics

We developed the diffHic methodology to evaluated changes in chromatin confirmation and enhancer interactions between experimental conditions. We continue to apply this methodology in experiments to understand auto-immune disease, breast cancer and immune cell differentiation. We integrate Hi-C with epigenome data.

Q: Expression signature analysis

We have developed a number of gene set test methods for assessing the behaviour of co-regulated sets of genes representing higher level biological processes, which we apply to analyses of stem cells, breast cancer and lung cancer and other disease. One of our priorities in the development of pre-ranked gene set enrichment methods that take account of inter-dependence between genes.

Sections on this page

About

Publications

Lab research projects

About

I am a statistical bioinformatician working on gene expression and on the regulatory mechanisms that control gene expression. I develop novel computational and statistical methods for analysing data from high-throughput molecular technologies. I implement these methods in publicly available software tools that have become international standards. Together with collaborators, I use these methodologies to make biomedical discoveries of significance to cancer, immunology and infectious diseases.

Education

Joint appointments

Publications

Selected publications from Prof Gordon Smyth

Baldoni PL, Chen L, Smyth GK. Faster and more accurate assessment of differential transcript expression with Gibbs sampling and edgeR v4. NAR Genomics and Bioinformatics. 2024;6(4):10.1093/nargab/lqae151

Baldoni PL, Chen Y, Hediyeh-zadeh S, Liao Y, Dong X, Ritchie ME, Shi W, Smyth GK. Dividing out quantification uncertainty allows efficient assessment of differential transcript expression with edgeR. Nucleic Acids Research. 2024;52(3):10.1093/nar/gkad1167

Li M, Smyth GK. Neither random nor censored: estimating intensity-dependent probabilities for missing values in label-free proteomics. Bioinformatics. 2023;39(5):10.1093/bioinformatics/btad200

Chen Y, Pal B, Lindeman GJ, Visvader JE, Smyth GK. R code and downstream analysis objects for the scRNA-seq atlas of normal and tumorigenic human breast tissue. Scientific Data. 2022;9(1):10.1038/s41597-022-01236-2

Liao Y, Smyth GK, Shi W. The R package Rsubread is easier, faster, cheaper and better for alignment and quantification of RNA sequencing reads. Nucleic Acids Research. 2019;47(8):10.1093/nar/gkz114

Chen Y, Pal B, Visvader JE, Smyth GK. Differential methylation analysis of reduced representation bisulfite sequencing experiments using edgeR. F1000Research. 2017;6:10.12688/f1000research.13196.1

Chen Y, Pal B, Visvader JE, Smyth GK. Differential methylation analysis of reduced representation bisulfite sequencing experiments using edgeR. F1000Research. 2017;6:10.12688/f1000research.13196.2

Chen Y, Lun ATL, Smyth GK. From reads to genes to pathways: differential expression analysis of RNA-Seq experiments using Rsubread and the edgeR quasi-likelihood pipeline. F1000Research. 2016;5:10.12688/f1000research.8987.1

Phipson B, Lee S, Majewski IJ, Alexander WS, Smyth GK. Robust hyperparameter estimation protects against hypervariable genes and improves power to detect differential expression. The Annals of Applied Statistics. 2016;10(2):10.1214/16-aoas920

Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, Smyth GK. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research. 2015;43(7):10.1093/nar/gkv007