For over a century, cancer patient tissue biopsies are stained by Haematoxylin and Eosin (H&E) for pathologists to examine using light microscopy. Our research focuses on improving H&E analysis by multiple approaches. Recently, spatial transcriptomic (ST) imaging and sequencing data enable us to link tissue morphological features in a H&E image with thousands of unseen gene expression values, opening a new horizon for understanding tissue biology and achieving breakthroughs in digital pathology. We developed STimage as a comprehensive suite of models for predicting gene expression and classifying tissue regions and cell types. For robustness, STimage predicts gene expression based on parameter distributions rather than fixed data points and estimates uncertainty from the data (aleatoric) and from the model (epistemic). STimage achieves interpretability by analysing model attribution at a single-cell level, in the context of histopathological annotation and functional genes, as well as characterising latent representation. Using diverse datasets from three cancers and one chronic disease, we assessed the model’s performance on in-distribution and out-of-distribution samples, across platforms, data types, and sample preservation methods. Further, we implemented an ensemble approach, incorporating pre-trained foundation models, to improve performance and reliability, especially in cases with small training datasets. Finally, we showed that using STimage-predicted values based solely on imaging input, we could stratify patient survival groups. Overall, STimage enables the prediction of molecular and cellular information from histopathological images, opening a new direction to advance digital pathology applications.
Assoc. Prof. Quan Nguyen is the head of the Genomics and Machine Learning Lab at the QIMR Berghofer Medical Research Institute. He is leading the QIMRB National Centre for Spatial Tissue and AI Research (NCSTAR). He completed a PhD in Bioengineering at UQ in 2013, a postdoc in bioinformatics at RIKEN in Japan in 2015, a CSIRO OCE Fellowship in 2016, an Australian Research Council DECRA Fellowship in 2021, and is a National Health and Medical Research Council Emerging Leadership Fellow (EL2). His lab uses spatial multiomics and machine learning analysis to find biomarkers of disease, understand cellular interactions, predict disease progression and stratify responses to drugs.