Research Computing

High-throughput computing: Fostering data science without limits

Biology and big data are now completely inseparable.

Most modern biology produces data sets too massive to manage by conventional standards, and the challenge will increase exponentially as the sophistication of the science grows.

The Center for High-Throughput Computing (CHTC), a joint partnership of UW–Madison School of Computer, Data & Information Sciences and the Morgridge Institute, sees this onslaught of data and says: Bring it on.

Miron Livny
Miron Livny

“We have established a goal of never letting the amount of data limit the experimental approach of the scientists,” says Miron Livny, the founder of high-throughput computing (HTC). Livny has been championing HTC for more than three decades as a UW–Madison computer scientist, and more recently as the Morgridge Institute’s lead investigator of research computing.

HTCondor is a task-scheduling software approach that essentially breaks a larger computational task into smaller pieces, allowing researchers to analyze more data (hence the term “high throughput”). The team now handles 250-300 projects a year, double that of five years ago, and uses hundreds of millions of hours of computing time.

And that’s just at UW–Madison. The global Open Science Grid provides HTC resources to the world, where it is the backbone system for Nobel Prize-winning projects such as detecting gravitational waves and discovering new subatomic particles. Just this year, it made a splash for its contribution to the discovery of a massive black hole in the center of our galaxy.

This service is gaining adherents on campus because scientists are learning that it is more than someone asking, “What technology do you need?” Research computing is a collaboration, and the people HTC brings to the equation are more important than the technology.

Livny says the HTC Facilitation Team is a great example. The emphasis on facilitators was way ahead of its time, almost unheard of in computer science circles. These are the translators who can work their magic between the technology and the bench experiments — finding the best way to maximize the data for the scientists.

Livny uses a hospital metaphor. Like a hospital ER room, HTC is not dedicated to one disease or one family of health challenges. It takes all comers — whether it’s particle physics or brain science or COVID 19. The facilitators help decide: What is the right computational “medicine” for each individual?

The UW–Madison and Morgridge side of HTC work together seamlessly — by design, one can’t tell where one begins and the other ends. But there is a unique ingredient Morgridge provides. Livny says the institute’s hiring flexibility allows the group to hire unconventional talent who might not be optimal fits for tenure-track roles, but are perfect for advancing HTC as a core service.

Brian Bockelman
Brian Bockelman

Brian Bockelman came on board in 2019 as a Morgridge research computing investigator, having decades of HTC experience with big physical science projects such as the CERN Collider in Switzerland and Ice Cube in the South Pole. He has been able to apply that experience to the massive computational needs we are now seeing in biological research.

For example, Bockelman provided assistance to the research computing team at the UW–Madison Cryo-EM Research Center, which is helping scores of UW–Madison scientists incorporate atomic-scale imaging into their research.

“Research computing’s real success is when researchers change the way they do science because of questions we ask, as well as the computing we provide them, opening their eyes to things they didn’t know were possible,” Livny says. “Ultimately, established scientists are able to think differently about the science itself, rather than just solving one distinct problem.”