The Center for High-Throughput Computing (CHTC) can turn potentially years of computing time into hours for data-intensive UW–Madison scientists.

Hunting viral variants across Wisconsin, powered by high-throughput computing

David O’Connor likened it to playing a game of Battleship.

When new and more dangerous variants of the COVID-19 virus emerged globally in late 2020 and early 2021, scientists needed to quickly track the “hits” and “misses” of these new strains in local communities. As a professor of pathology and laboratory science at UW–Madison, with expertise in tracking viral variants of HIV, O’Connor refocused his research efforts on the COVID-19 challenge in Wisconsin.

In the first months of the pandemic, O’Connor and research colleague Thomas Friedrich applied their genetic surveillance work primarily on Wisconsin’s two largest counties — Dane and Milwaukee — to better understand patterns of viral spread through the population. Later that summer, they applied the same tracking process to an outbreak within the UW–Madison intercollegiate athletics program. They pivoted a third time when COVID slammed two UW–Madison residence halls hard in the fall 2020 semester.

The goal each time was to track genetic changes in the virus from case to case, giving medicine an early warning system of genetic shifts that can change the trajectory of the epidemic. Early on, a single variant swept the globe but the threat of subsequent variants seemed low, O’Connor says.

Then, around the turn of the year, “an inflection point happened” — the so-called UK variant, and other problematic strains from South Africa and Brazil.

“The scary thing about the UK variant was how quickly it displaced all the other viruses that were circulating before it arrived,” O’Connor says. “We realized around New Year’s that we really needed to up our game in terms of sequencing so that we could figure out when these variants arrived in our community.”

They did just that, partnering with the Dane County Health Department to attain and sequence as many as 300 cases per week, which was close to 10 percent of all cases in the county. They were sequencing more cases per capita than almost anywhere in the United States.

What is absolutely critical to this business is speed, O’Connor says. Science can’t outrun the pandemic, but if it can at least keep pace, the data will get to public health leaders in time to actually make a difference in slowing the spread. Fortunately for Wisconsin, the UK variant didn’t have full grip in the state until early March, a time when widespread vaccinations were already making a difference.

“Basically, what CHTC allows us to do is process these hundreds of different samples in parallel so that we reduce the total turnaround time,” O’Connor says. “Our goal is to take less than 14 days from sample collection to sequencing to making the data publicly available. Processing the data serially, one sample after the next, would have made us much less agile.”

O’Connor and colleagues had something of a secret weapon in the data analysis — the Center for High-Throughput Computing (CHTC), which is a shared resource of the UW–Madison College of Letters and Science and the Morgridge Institute for Research. CHTC provides a form of distributed computing that automates management and assignment of computing tasks across a network of thousands of computers, essentially turning a single massive computing challenge into a supercharged fleet of smaller ones. The end result can shrink the overall run-time of a big data project from weeks or months down to a matter of hours while reducing the human effort required to manage the computational task.

“The HTC platform is helping democratize the world of research computing, applying it to virtually any subject where big data is produced.”

Eric Wilcots
Miron Livny
Miron Livny

High-throughput computing, a technology that has existed since 1985, wasn’t created with public health challenges in mind. In fact, its early applications were in massive physical science projects such as detecting cosmic neutrinos, particle physics and gravitational waves. But it has grown into a scientific resource that conceivably has no boundaries.

After being applied to thousands of projects, CHTC has reached a state where adapting to a new challenge — in this case, a novel coronavirus that is taking over the world — does not require months of retrofitting and recalibration. It’s what Morgridge investigator and high-throughput computing founder Miron Livny refers to as “readiness.”

“For this readiness to happen, we needed to have the people, the technologies, and the computing resources all in place,” Livny says. “So our ability to respond to the COVID emergency was the result of having the right team that has been together a long time, for some more than 20 years. That is not common in this line of business.”

Adds Morgridge Investigator and CHTC team member Brian Bockelman: “Every new group that comes to us requires adapting. Every time we start up a new project, they learn a bit, we learn a bit. It’s really a translational form of computer science. You can do some computer science alone in a room with a whiteboard, but we only progress by having patients, so to speak.”

Brian Bockelman
Brian Bockelman

CHTC will be tending to a lot more “patients” in the years ahead thanks to the Partnership to Advance Throughput Computing (PATh) established in fall 2020.  The five-year, $22.5 million project is supported by the National Science Foundation and intended to increase adoption of high-throughput computing both at UW–Madison and nationally.

“Today, through our new partnership with the Morgridge Institute, we are expanding the reach of high-throughput computing to hundreds of Wisconsin scientists every year,” says Eric Wilcots, dean of the UW–Madison College of Letters and Science. “The HTC platform is helping democratize the world of research computing, applying it to virtually any subject where big data is produced.”

For O’Connor, the partnership will not end once COVID is vanquished. The alliance between hospitals, public health office and CHTC will be available to adapt to future public health emergencies, and they are looking to expand their data collection to environmental sampling, such as wastewater and air, that can forecast viral spread.

“If we get the virus under control here in the U.S., there are still going to be major unmet needs elsewhere in the world,” he says.