In August 2015, just before going on vacation, virology researcher Dave O’Connor teed up the largest data analysis challenge of his lifetime. The computing run included 694 independent jobs, each one with about one billion points of genomic data to process.
O’Connor returned to find that his “set it and forget it” gamble paid off handsomely: 693 of the 694 computing runs had fully completed, with zero human intervention.
“It was like a freight train,” recalls O’Connor, a UW-Madison associate professor of pathology and laboratory medicine. “We started at station one, let it run for a while, then it moved automatically on to station two.”
O’Connor’s first successful foray into high-throughput computing (HTC) has opened the door to vast new possibilities in his research on human immunodeficiency virus (HIV), the virus that causes AIDS. O’Connor works in the Wisconsin National Primate Research Center and studies the primate equivalent to HIV, known as simian immunodeficiency virus (SIV), and to this point primate researchers have been limited by the relative lack of quality genomic information on primates.
“This is absolutely transformative,” says O’Connor. “We are able to contemplate questions we were never able to contemplate before. Suddenly we’ve gone from data on about 20 animals to more than 400 animals of data, over the course of about two months.”
This is a critical time to make such a quantitative leap in primate research. The National Institutes of Health is about to embark on a project to study the genomes of more than a million people, with the recognition that such massive cohorts of data will be essential to finding the genetic signatures of complex diseases.
While it may never catch up to the human genome project, O’Connor says a parallel primate sequencing effort would provide a valuable complement to studying diseases. Primate research operates in a very tightly controlled environment, where ethical standards require the use of as few research animals as possible. Detailed clinical information is compiled on each animal from birth to death, which further improves the value of genetic data.
In the case of rhesus macaques, which are used in O’Connor’s research, virtually all have come from select populations in India, and have not been taken from the wild in more than four decades. That means primates bred in captivity will have less genetic variation to sort through, making finding the genetic “needle in a haystack” potentially easier than with humans. The relevance also is significant, with humans and macaques sharing about 94 percent of a common genome.
The specific genetic targets in O’Connor’s lab concern a highly promising population called “HIV elite controllers.” About two decades ago, doctors in AIDS clinics across the nation began encountering a very small number of patients who were infected with HIV but never developed AIDS, and in fact had very little detectable virus in their blood. There were usually no more than one or two patients from every community.
Recognizing the medical importance, prominent HIV scientist Dr. Bruce Walker of Massachusetts General Hospital began an influential project to pull together genomic information from all elite controller cases in the country, to help determine what makes this group special.
It turns out that science so far has produced an incomplete answer, O’Connor says. Researchers found specific genetic variants in the major histocompatability complex, or MHC, that make people more likely to become elite controllers. However, only about 50 percent of people with the “protective” MHC variants become controllers, meaning one or more other locations in the genome must also be working together with MHC to cause this immunity.
O’Connor says that same type of elite controller population exists in research primates with SIV, and his work focuses on identifying those genetic pathways that could lead to naturally suppressing AIDS, a disease that impacts more than 40 million people worldwide. However, the current reference genome for primates is simply not developed enough to be able to provide real answers.
That’s where the recent high-throughput computing work could prove transformative, O’Connor says, if it can be applied to sequencing all of the genomes of primates used in federal primates research facilities across the country. This would begin to build a resource that adds great value to any primate studies involving human disease.
“My belief is that a few years from now, it will considered improper to not get full genetic characterization of every animal that goes into an experiment, because the cost of getting that information is dwarfed by the costs of the experiment itself,” says O’Connor. “And the potential for that data to inform your experiments is enormous regardless of what you’re studying, whether it be infectious disease, neurobiology, or regenerative medicine.”
While still very early, O’Connor says his lab has been able to identify some variance in the genes suspected to be important to elite controllers, based on the high-throughput data compiled to date. And the more data they can compile, the less likely they will be tracking down false leads.
“This is not going to provide instant solutions,” cautions O’Connor. “But it’s going to provide lead generation. We can now look and say, ‘This animal has an interesting phenotype. Can we find any evidence of unusual variation across these genes of interest?’”