I use computational methods to help us make sense of very large biological datasets.
Today, biologists literally can generate billions of data points overnight. There is no way that you can make sense of the data with just a spreadsheet. We write and utilize sophisticated computational algorithms to make sense of the data.
Our first goal is to help scientists understand the large datasets they create or download from the public domain. A second goal is to help them decide which future experiments are most likely to deliver results.
An example of the first goal is in processing and understanding high-throughput data coming off our DNA sequencer. The data can represent many things such as genes expressed in a cells or tissues, or regions of the genome that regulate gene expression. We create new algorithms for processing and understanding the data so that we can partner with biologists to make discoveries to improve human health. We have used these methods to assist Morgridge scientists as they investigate questions in regenerative and vascular biology and other collaborations on eye diseases, blood diseases and heart health.
An example of the second goal is an algorithm we have developed called “KinderMiner.” This simple yet powerful algorithm computationally reads all 30 million papers available in the PubMed dataset, and provides suggestions to scientists about which targets to explore in their next set of experiments.
Experiments are expensive, and it is impossible for scientists to perform experiments on all the possible targets. For instance, for cellular reprogramming, you turn genes on in a cell to reprogram the cell into another cell type that might be useful for therapy. Which genes do you turn on? Reprogramming usually requires a combination of genes, and combinatorial problems become difficult very quickly.
A typical reprogramming task might require performing about 2 million experiments to cover all the combinations, completely intractable to biologists. KinderMining can help prioritize the genes to only require about 200 experiments. This narrows the playing field making infeasible experiments possible.
We are currently expanding KinderMiner using machine learning and other methods to consider context, synonyms, and distinguishing a negative from a positive hit. We are also building a web application, so anyone can use it.
We aim to build tools that allow biologists or anybody in related fields to do their work faster and easier.