The Stewart Computational Group is driven by two goals: (1) building foundational datasets, models, and algorithms useful to other researchers and (2) finding patterns in large datasets for establishing targets for further wet-lab analysis. To these ends, we develop and utilize algorithms for the analysis of large biomedical omics datasets, including genomics, transcriptomics, epigenomics, multi-omics, and bibliomics (biomedical text). Collaboration is at the heart of what we do and through the variety of projects described below we collaborate with researchers from the Morgridge Institute for Research, the University of Wisconsin, and other institutions around the world.
- Analysis of omics data to identify genes and pathways involved in a disease or condition
We are utilizing existing algorithms and in-house developed tools to identify genes and pathways implicated in a variety of diseases and conditions, from COVID-19 severity, to responses to vaping, to diabetic retinopathy.
- Developing algorithms for analysis of high throughput omics data
To make sense of the vast amount of data generated by omics and multi-omics studies, researchers require efficient, effective, and accurate computational approaches. We are developing such algorithms. These algorithms are helping researchers make more accurate and meaningful predictions about targets for further wet-lab analysis.
Some recent examples of our work include:
- CHARTS: a web application for characterizing and comparing tumor subpopulations based on single cell RNA-seq data.
- SpatialCorr: an algorithm for finding correlations between pairs or groups of genes in spatial transcriptomic data. We are using SpatialCorr in the context of cancer and tumor heterogeneity, but it could be applied to any spatial transcriptomic dataset.
- Building high-quality datasets as a springboard for further analysis
High-quality ”reference” genomes have become a pillar of modern genomics, as they provide the basis for a variety of molecular and genetic approaches to studying important biological characteristics across species. Our group has led the effort to assemble such genomes for three species:
- Blue Whale—the largest animal that has ever lived. This genome is useful to researchers interested in body size control, cancer, and conservation.
- Etruscan Shrew—the smallest mammal by mass, about twenty-times lighter than an adult mouse. This genome is useful to researchers interested in body size control, metabolism, and possibly aging.
- Nile Rat—an important model for diabetes, as the species can mimic the natural progression of type 2 diabetes in humans much more closely than other common rodent models. The Nile rat is unusual among rodents in that it is active during the day (diurnal) and thus is also useful for studying diurnal circadian rhythm. We have also provided transcriptomic datasets for 22 organs in the Nile rat, which will be useful for understanding the roles of various organs in the development of diabetes and other diseases.
- Developing and utilizing text mining algorithms
PubMed contains millions of abstracts, and more text—biomedical and beyond—is generated all the time. Increasingly recognized as critical for biomedical research, methods for analyzing text leverage an orthogonal data type that can augment the various data types generated by wet labs, such as genomic or transcriptomic analysis. We develop and utilize a variety of text mining analysis algorithms. Some examples include:
- KinderMiner and Serial KinderMiner (SKiM). KinderMiner is a co-occurrence-based algorithm for finding associations between two terms or concepts (such as between a gene and a disease), and its serial form, SKiM, can be used to find new connections between two terms or concepts via a third intermediary.
- Large Language Models (LLMs). We are fine-tuning LLMs to perform named entity recognition and relationship extraction to build biomedical knowledge graphs. We are employing prompt engineering and other methods on generative LLMs (like GPT-4) to summarize and reason about biomedical text.
- Combinatory techniques. We use LLM-based methods to augment and complement the co-occurrence modeling being done with KinderMiner and SKiM for tasks such as drug discovery and drug repurposing—for example, to find drugs and drug combinations potentially useful for various cancers. We are also combining the text-based methods with other omics datasets (such as transcriptomics) to further refine predictions about genes, pathways, or drugs that might be important for a particular disease, condition, or cell state.
See https://github.com/stewart-lab/ for some of our publicly available code.