The tools of modern biology have made it possible to obtain an incredibly detailed picture of how cancer cells differ from healthy cells at the molecular level. Somewhat paradoxically, despite these meticulous portraits of cancer, it remains remarkably difficult to answer the very fundamental question: What caused cancer in this patient?
“Cause” here does not refer to carcinogen exposure, DNA replication errors, or other factors that induce molecular changes. Rather, it refers to which of the differences observed in a tumor cell promoted the transition from a healthy to a diseased state. A limited number of alterations “drive” this transition to cancer, whereas the rest are secondary effects that arise in cancerous cells but have little functional impact. Genes that contain driver alterations are known as driver genes.
In this discussion I focus on mutations, small changes in the DNA of a cancerous cell, but the ideas also apply to other drivers of cancer, such as copy number amplifications and deletions, gene fusions, non-coding RNAs, and epigenetic changes. In some tumors, infamous driver genes are mutated, partially explaining the cancer’s origin. However, oftentimes all mutations in a tumor are rare or even unique, making the cause of cancer in that patient a mystery (Figure 1).
We don’t need to make the leap from candidate drivers to the final outcome – cancer – all at once. Working backwards and incorporating existing cancer and molecular biology knowledge allows us to decompose the problem into more manageable steps.
Tumor cells have certain common properties, often called the hallmarks of cancer. Examples include uncontrolled growth and ignoring the signals that typically tell a cell to die (Figure 2). Moving one step further, we realize that specific biological pathways govern these hallmark processes.
Pathways are groups of proteins that communicate with each other and operate together to execute some biological function. Complex biological processes like cell division require dozens of proteins working in concert to function properly. When a pathway member dysfunctions and communication breaks down, a normal biological process can become a cancer-promoting hallmark trait (for example, the cell death pathway no longer responds to incoming signals).
These pathways can act as convergence points and bring order to the chaotic mutational landscape. Even though tumors’ mutations are quite unique, some of those mutations affect protein function. Different altered proteins can have similar effects in a cell if they work together in the same pathway. Considering the progression of DNA mutations, affected proteins, dysregulated pathways, associated hallmarks, and finally cancer provides a structured way to reason about which mutations may drive cancer.
So is the driver identification problem solved? Not quite.
Although experts have compiled extensive resources describing pathways in human cells, these can be far too limited for disease analysis.
Like a hurricane ravaging the entire Eastern United States, cancer’s devastation does not respect human-drawn (state/pathway) boundaries. Rather than ask which predefined pathways have been impacted, we must learn which proteins may be operating together. This is feasible when we consider the large tumor datasets from The Cancer Genome Atlas and the International Cancer Genome Consortium, along with existing data that specify which pairs of proteins can potentially communicate directly.
Combining these data, we can use a machine learning framework known as expectation maximization to propose the hidden pathways, determine how mutations in individual tumors affect those pathways, and refine the pathway estimates based on similarities across many tumors. The end result is a custom pathway model for each tumor that suggests how the hallmarks of cancer may have arisen and nominates the mutations that affected proteins in the pathway as plausible drivers.
The ability to learn the hidden pathways that are disrupted in individual tumors could have a profound impact on cancer biology and oncology.
Anecdotally, researchers have already recognized that tumors originating in different tissues may have similar drivers. For instance, certain breast cancers resemble the most common subtype of ovarian cancer. Classifying tumors based on their pathway alterations would provide a systematic strategy for identifying cancer subtypes across all types of cancer, potentially revealing seemingly unrelated tumors that could benefit from similar therapies.
In an individual patient, complete knowledge of the pathway structure that drives the tumor opens new avenues for precision medicine, in which therapies could target not just mutated genes but also inferred, unaltered pathway members. This computational research is in the early stages and may not yield immediate breakthroughs. But ambitious long-term aspirations provide the best guidance for short-term research directions.