Technological advances in sequencing have generated large-scale cancer genomics datasets, creating both new opportunities and substantial computational challenges. In the first part of this talk, I will introduce the MetaGraph framework, a scalable approach for representing and querying cohort-level sequencing data using graph-based indexes. I will discuss its performance across diverse cohorts and demonstrate applications to RNA-Seq and DNA-Seq data, including analyses of gene expression, alternative and trans-splicing, and structural variation. In the second part, I will focus on the complementary challenge of generating realistic sequencing data for benchmarking new computational methods. As part of the ICGC benchmarking working group, I will outline the need for high-fidelity simulation of DNA and RNA sequencing data. I will then present an overview of an emerging simulation framework that captures key biological and technical characteristics, and discuss how such data can support the robust evaluation of genomic analysis tools.
Andre Kahles is a bioinformatician and senior scientist at ETH Zurich’s Department of Computer Science, where he leads research in clinical and population-scale genomics within the Biomedical Informatics Lab. His work focuses on developing algorithms and data structures for the analysis of large-scale sequencing data, with particular emphasis on transcriptomics, cancer genomics, and alternative splicing. He has contributed to major international cancer genomics consortia, including ICGC and TCGA, with a focus on transcriptome variation across diverse cohorts. His research bridges methodological innovation and biomedical application, enabling scalable and robust analysis of increasingly large and complex sequencing datasets.