Multi-omic data and interactive web tool made publicly available to aid COVID-19 research

Why is it that some COVID-19 patients become extremely ill and die, while others experience only mild symptoms?

The molecular underpinnings of COVID-19 are the subject of a recent collaboration between the Morgridge Institute for Research, the University of Wisconsin–Madison, and Albany Medical College. Their study uses mass spectrometry, RNA sequencing, and machine learning to explore the molecular traits that might influence the severity of the disease.

The project began when Dr. Ariel Jaitovich, a pulmonary and critical care physician at Albany Medical Center in New York, reached out to Josh Coon, Morgridge investigator in metabolism and professor of biomolecular chemistry at the UW–Madison School of Medicine and Public Health. They identified that the project could include RNA sequencing data with a greater computational scope, and brought on Ron Stewart, Morgridge investigator and associate director of bioinformatics to lead those efforts.

Early in the COVID-19 epidemic, Jaitovich noted that patients who were admitted to the hospital for critical care displayed a variety of clinical outcomes that ranged in severity while the impacts of the novel coronavirus were still unknown.

“We want to help people,” says Jaitovich. “We want to spend some energy in this terrible time to see if we can help the suffering people…that was the primary driver.”

Jaitovich and Coon, who had previously collaborated on a proteomic analysis in chronic obstructive pulmonary disease patients and animal models, recognized that a systemic, multi-omic approach could potentially help characterize the range of disease caused by this novel viral infection.

“The most impact we can probably have is to get this [data] into the hands of others as quickly as possible,” Coon adds. “I think that was something we thought about from the very beginning—how would we contribute to the global effort.”

The multi-institutional team recently made their findings available on the pre-print server, medRxiv, while the paper is pending publication in a peer-reviewed journal.

“The more accessible we can make our data, the more understandable metadata that surrounds it, that makes it more useful to the scientific community in general.”
Ron Stewart

They analyzed blood samples from 128 sick patients from the Albany Medical Center ICU—102 samples were positive for COVID-19, and 26 samples were identified as non-COVID-19 controls.

The researchers created a database of over 17,000 different proteins, metabolites, lipids, and RNA transcripts that have an association with clinical outcomes. They identified 219 molecular features that correlated strongly with COVID-19 severity.

Many of these molecules and genes are involved in blood vessel damage and blood coagulation, as well as dysregulation of several processes involved in the immune response—results that have also been independently published in other research studies.

“It was interesting the way we put this together, because we had sort of two fronts on the analysis,” says Katie Overmyer, associate director of the Laboratory for Biomolecular Mass Spectrometry at UW–Madison. “One was a sort of data-driven exploration of biological stories. And then the other was based on what we were seeing in the news.”

One of Overmyer’s colleagues in the Coon lab, assistant staff scientist Evgenia Shishkova, noted that as they were collecting and analyzing their data, other newly published research would correlate with their findings.

“The fact that somebody with a totally different cohort of people using similar, but actually kind of different, methods could still see the same differences, I think was very powerful,” she says.

To accompany the dataset, the researchers also published an interactive web tool, covid-omics.app, as a free public resource for the scientific community.

“The goal with the tool was to create something that is sophisticated and powerful enough to enable powerful insights for people who are interested in particular processes or molecules, but also keep it sort of simple enough to be flexible,” says Ian Miller, a data scientist in the Coon lab, who led the efforts to design the web tool.

The researchers were able to analyze many different molecular entities that aren’t normally being questioned in the clinical setting, says Jaitovich. He also believes that so far, the largest study focused on clinical outcomes looking at the association of severity with COVID-19, which separates them from studies focused solely on diagnostics.

“A lot of studies have compared COVID versus non-COVID as kind of like this dichotomy of black and white,” adds Miller. “But with our many different clinical measurements…we can think of it as a continuum, which is a unique aspect.”

Coon says that with the size of their dataset, they can use the web tool to determine how the abundance of molecules might correlate with the severity of infection. But, he and the team agree that the real impact will come from the public accessibility of these data, along with the dynamic interactive analysis capabilities of the web tool.

“When we find ourselves in a situation where it’s critical to make progress quickly, you know, that baseline information is out there and we can use it,” says Scott Swanson, a computational biologist in the Stewart group at the Morgridge Institute.

Swanson, and postdoctoral fellow Matthew Bernstein, are no strangers to working with large amounts of data in the bioinformatics field—and see using the data in a publicly-accessible resource as an encouraging response to help combat the COVID-19 pandemic.

“People depositing their data publicly is extremely important,” adds Bernstein. “You’ll never know for what purpose it will be needed for.”

Stewart suggests that the analysis tool may be useful to other researchers who might be trying to identify drugs or metabolites that could be used as targets for therapeutics.

“There may be some really good things that come out of this in the long run,” he says. “The more accessible we can make our data, the more understandable metadata that surrounds it, that makes it more useful to the scientific community in general.”