Biologist Sydney Brenner, in his 2002 Nobel Prize lecture, cautioned that science is now “drowning in a sea of data and starving for knowledge” — a reflection of how our information explosion does not guarantee greater insight.
Powerful online databases such as PubMed, the leading biomedical research archive that contains more than 39 million abstracts (and counting), are an interesting case in point. Mining this database has great potential to inform new scientific directions, but it also produces hundreds of false positives that may leave a scientist treading water.
A biostatistics team at the Morgridge Institute for Research in Madison is tackling this challenge with a hybrid solution that combines two different “flavors” of machine learning algorithms. The results, reported in the January 22 issue of the journal BMC Bioinformatics, show how these two systems combined address their respective limitations and provide a one-two punch for evaluating millions of journal articles to inform hypotheses plausibility.
One is called “literature-based discovery” (LBD), which helps researchers search for unexpected connections in past findings to inform new directions. Morgridge scientists created an LBD tool known as Serial KinderMiner, which employs three different variables — for example, medical conditions, human genes, and FDA-approved drugs — to identify co-occurrence links in the literature, allowing scientists to connect drugs to diseases via their shared genetic pathways.
The other is the more familiar “large language model” (LLM), systems that are trained on massive databases to understand, interpret and produce results that mimic human communication. LLMs are now ubiquitous tools that power AI chat systems like ChatGPT and Claude.
Nodding at its hybrid nature, the team dubbed the resulting tool SKiM-GPT.

“By adding an AI element to our system, we solve two problems,” says Morgridge scientist Jack Freeman, lead author on the project. “First, we reduce a lot of false positives that surface from just co-occurrence because so many biological terms share the same set of characters. Second, we’re now able to go much further than associations. We’re able to start getting at the actual mechanistic relationship between these biomedical terms.”
Equally compelling, says Morgridge Investigator Ron Stewart, is SKiM-GPT’s ability to solve a thorny problem with LLMs — their propensity to “hallucinate” results, and provide phony citations used to reach their conclusions. The co-occurrence-based SKiM algorithm allows for retrieval-augmented generation (RAG), which prevents LLM hallucinations. So, by combining RAG with LLMs, SKiM-GPT alleviates issues with each algorithm used individually.
Every hypothesis run through SKiM-GPT will come back with three key findings: a scored hypothesis; the rationale used to evaluate the question; and specific citations of all abstracts used in the review, Stewart says. To determine the score, the AI query will determine which abstracts agree with, disagree with, or are neutral with regard to the hypothesis. Based on evaluating all the relevant abstracts, each hypothesis gets a score ranging from minus-two (least plausible) to plus-two (most plausible).
The researchers also created a middle step between SKiM co-occurrence and the LLM called a relevance filter. Since SKiM will produce a huge number of initial literature associations, this intermediate step quickly and cheaply identifies which papers have utility in evaluating the hypothesis. That ensures the downstream LLM only gets exposed to highly relevant data.

“The relevance filter is an important piece, because we get just the relevant abstracts to pass to the more expensive reasoning LLM,” Stewart says.
If this process were done manually, what are the human costs? The researchers went through 14 hypotheses, and read 10 abstracts per each hypothesis, and then recorded the time that it took to evaluate. It took them, on average, 45 minutes to evaluate just one of these hypotheses to understand whether there were scientifically valuable relationships.
That’s roughly 336 hours of work that can be reduced to minutes using SKiM-GPT. This system will save about 97 percent of a researcher’s time, while reducing the cost of running AI to a fraction of what it would take a comparable query.
They compared those manual results from the four scientists with the final results from SKiM-GPT, and found the automated tool had concurrence with the human results in 13 of 14 hypotheses, or 93 percent.
“I think it’s both a lead generation and validation tool,” adds Stewart, who leads the Morgridge bioinformatics team. “SKiM-GPT can help wet lab researchers choose the best experiments to run by discovering, ranking, and evaluating hypotheses based upon existing literature. But it also can be used to validate an existing hypothesis.”