In a recent webinar hosted by Fierce Biotech, Clinithink joined AstraZeneca and Barts Health NHS Trust to share the story of an ambitious AI project: building a predictive model for early lung cancer detection by mining unstructured clinical data alongside conventional sources. The session offered a rare look at how pharmaceutical innovation, clinical practice, and technology intersect the real world.
The most revealing insights emerged during the attendee Q&A. These were not abstract hypotheticals—they were the kinds of questions that emerge in every pharma boardroom and strategy session: How does this scale? What makes it novel? Can it be trusted?
Below, we expand on the most pressing questions and explain why each matters.
What did unstructured data reveal that structured data couldn’t?
The project uncovered predictive features that structured data alone could not reveal. While structured fields like ICD codes can indicate that a patient has been diagnosed with lung cancer, they provide little insight into the patient’s clinical context. In contrast, narrative data—detailed clinical notes written by physicians—capture the subtleties of the patient’s journey: symptom onset, history of smoking, comorbidities, and clinical impressions that help paint a fuller picture of disease presentation and progression.
By transforming that unstructured text into a computable format, the model uncovered surprising predictors. For instance, congenital abnormalities of the great vessels appeared as a meaningful feature. Although these anomalies do not cause cancer, they appear in imaging reports and can influence referrals or mask symptoms —factors that may affect when and how a diagnosis is made.
Cough, a common symptom across a wide range of conditions, also became more meaningful in context. Rather than treating it as a standalone risk factor, the model learned to weigh it differently when other coexisting factors were present. As Dr. Will Ricketts, a specialist in lung cancer at Barts Health and investigator for the study noted during the discussion, “It’s pulling out things that we would never have considered as intrinsically obvious causes for a cough. It’s not that these things reduce your risk of having lung cancer, but they reduce your risk of your cough being related to lung cancer.” In other words, the model doesn’t just flag symptoms—it interprets them in context. In essence, the AI discerned when a cough was benign versus when it signaled something potentially more serious. This kind of capability makes national lung-cancer screening programs more successful, economical, and scalable.
By identifying subtle, contextual clues—like when a common symptom becomes clinically meaningful—this model can help surface cases that traditional screening programs often miss. It’s particularly valuable for reaching patients who don’t meet rigid eligibility criteria or never present for screening at all.
But with any effort to expand the detection net, a natural concern arises: could this approach increase false positives?
Does this model address false positives in screening?
It’s a natural question, particularly given the longstanding concerns around false positives in low-dose CT (LDCT) lung screening. But it’s important to clarify that this model isn’t meant to replace screening—it’s designed to complement it.
Current screening programs have made significant strides. For example, the SUMMIT trial brought the false positive rate down below 5%, compared to the 25% often cited from earlier U.S. studies. Yet screening captures fewer than half of lung cancer cases. This model was built to find the other half—those who fall outside the age range, who don’t meet risk model thresholds, or who simply don’t show up for screening.
In that sense, AI helps identify patients who might otherwise go undetected until it’s too late for curative treatment. Rather than creating more noise, it adds signal in areas where the existing system is silent.
How is the model’s impact being measured—and can it scale?
The team is currently validating the model through a prospective clinical trial, with stage shift as the key outcome metric. A stage shift in cancer treatment is finding patients at a different, ideally earlier, stage of disease. The goal isn’t necessarily to diagnose more people—it’s to diagnose them earlier, when treatment can make the biggest difference. Patients diagnosed with stage I or II disease typically can expect to survive their lung cancer. Retrospective data from the study suggests, those with more advanced stage III or IV disease are unfortunately unlikely to survive.
The study projects a 9–10% shift from late- to early-stage diagnosis. That’s meaningful not just in terms of survival, but in terms of economics. Early-stage cancer treatment tends to be both more effective and less costly. The health economic case for broader adoption is compelling.
For AstraZeneca, the project passed two critical internal tests: Could it be done? And was it cost-effective? This cross-functional buy-in—from R&D to digital health—signals both the feasibility and strategic fit of this approach within pharma-led innovation.
What does this mean for evidence generation in oncology?
This project signals a fundamental shift in how evidence is generated—particularly for pharma. Traditional methods like chart reviews or structured EHR queries still have value, but they’re expensive, time-consuming, and hard to scale. More critically, they often miss nuance.
By unlocking unstructured clinical notes with AI, we’re moving from static, retrospective analysis to dynamic, real-world discovery. This enables teams to surface clinical patterns without predefined hypotheses—to observe the journey before defining the rules.
For pharma teams looking to understand real-world treatment pathways, refine inclusion criteria, or explore new endpoints, this approach doesn’t just streamline evidence generation—it transforms what’s possible.
Can this approach be applied to other cancers, rare genetic disease or other complex conditions?
Absolutely. Cancers and rare genetic disorders pose similar challenges: patients present with recognizable clinical patterns, yet clinicians may not recognize them—especially if the disease is unfamiliar or very early.
By focusing on phenotypic patterns captured in clinical free text authored by physicians at the point of care, this approach can identify potential cases that would otherwise remain buried. This isn’t just theoretical. The same underlying technology has already been used to support rare disease diagnosis prediction, including a study recently published in Nature.
What does it take to get internal teams on board with this kind of work?
Implementing novel AI in clinical practice demands persistence and a willingness to challenge legacy processes. Most internal systems are designed around structured data and conventional RWE. Integrating unstructured data and AI into those workflows often means changing processes or building new ones entirely.
That said, once stakeholders see the quality of the outputs—results strong enough to be accepted in peer reviewed journals and conferences like Nature and ASCO—momentum builds. What starts as a proof of concept quickly evolves into a strategic initiative.
What’s next?
This work reflects more than a new approach to evidence generation—it hints at a broader shift in how Pharma engages across the healthcare ecosystem. By using AI to uncover meaningful insights from everyday clinical data, it becomes possible to contribute earlier to the patient journey—in ways that support providers, inform payer decisions, and ultimately benefit patients.
It’s a quiet but important evolution: from observing care to thoughtfully shaping it.
Watch the Full Conversation
This Q&A captures only a portion of the discussion from the webinar “AI-Driven Clinical Intelligence in Support of Oncology and Population Health.” For deeper insights into the collaboration between Clinithink, AstraZeneca, and Barts Health—and what it means for the future of oncology—watch the full recording on demand: