Scholarly snowball: Deep learning paper generates big online collaboration

Bioinformatics professors Anthony Gitter and Casey Greene set out in summer 2016 to write a paper about biomedical applications for deep learning, a hot new artificial intelligence field striving to mimic the neural networks of the human brain.

They completed the paper, but also triggered an intriguing case of academic crowdsourcing. Today, the paper has been massively revised with the help of more than 40 online collaborators, most of whom contributed enough ideas to become co-authors.

Gitter, of the Morgridge Institute for Research and University of Wisconsin–Madison; and Greene, of the University of Pennsylvania; both work in the application of computational tools to solve big challenges in health and biology. They wanted to see where deep learning was making a difference and where the untapped potential lies in the biomedical world.

By academic journal standards, the review paper — “Opportunities and obstacles for deep learning in biology and medicine” — has put up some seriously big numbers. Since publishing on the preprint server bioRxiv (pronounced ‘Bioarchive’) in May 2017, the pdf of the paper has been downloaded more than 23,000 times. It’s been commented on by more than 500 Twitter followers and includes 538 references.

Then this December, the authors got further proof of its reach. In the news article “The science events that shaped the year,” the journal Nature cited it as one of the most popular papers of 2017.

Gitter says the popularity is nice, but the precedent for how it impacted the research is more meaningful. In all, about 25 percent of the paper today reflects new material from scientists who weighed in after the piece went online. An updated version will be resubmitted to the journal where it’s under review.

Gitter likened the process to how the open source software community works.

“We are basically taking a software engineering approach to writing a scholarly paper,” he says. “We’re using the GitHub website as our primary writing platform, which is the most popular place online for people to collaborate on writing code.”

Adds Gitter: “We also adopted the software engineering mentality of getting a big team of people to work together on one product, and coordinating what needs to be done next.”

The new authors frequently provided examples of how deep learning is impacting their corner of science. For example, Gitter says one scientist contributed a section on cryo-electron microscopy, a new must-have tool for biology imaging, that is using deep learning techniques. Others rewrote portions to make it more accessible to non-biologists or provided ethical background on medical data privacy.

Deep learning is part of a broader family of machine learning tools that has made breakthrough gains in recent years. It uses the structure of neural networks to feed inputs into multiple layers to train the algorithm. It can build ways to identify and describe recurring features in data, while also being able to predict some outputs. Deep learning also can work in “unsupervised” mode, where it can explain or identify interesting patterns in data without being directed.

“Deep learning tries to integrate things and make predictions about who might be at risk to develop certain diseases, and how we can try to circumvent them early on.”

Anthony Gitter

One famous example of unsupervised deep learning is when a Google-produced neural network identified that the three most important components of online videos were faces, pedestrians and cats — without being told to look for them.

Deep learning has transformed programs like face recognition, speech patterns and language translation. Among the scores of clever applications is a program that learns the signature artistic traits of famous painters, and then transforms everyday pictures into a Van Gogh, Picasso or Monet.

Greene says deep learning has not yet revealed the “hidden cats” in healthcare data, but there are some promising developments. Several studies are using deep learning to better categorize breast cancer patients by disease subtype and most beneficial treatment option. Another program is training deep learning on huge natural image databases to be able to diagnose diabetic retinopathy and melanoma. These applications surpassed some of the state of the art tools.

Deep learning also is contributing to better clinical decision-making, improving the success rates of clinical trials, and tools that can better predict the toxicity of new drug candidates.

“Deep learning tries to integrate things and make predictions about who might be at risk to develop certain diseases, and how we can try to circumvent them early on,” Gitter says. “We could identify who needs more screening or testing. We could do this in a preventative, forward thinking manner. That’s where my co-authors and I are excited. We feel like the potential payoff is so great, even if the current technology cannot meet these lofty goals.”