Dec 05, 2024 Clinithink

Better facts, better outcomes: How provenance improves reliability in healthcare AI

Whenever you’re communicating something important, at least three key things matter: one, choosing precise language; two, being certain of the facts that support your conclusions; and three, not spending a lot of extra time on irrelevant details.

These rules are especially important when choosing the best healthcare AI to support insights and innovation. As we covered in our last blog on granularity, clinical natural language processing (CNLP) offers benefits over traditional large language models (LLMs) in medical settings because of its speed and precision when getting insights, particularly from unstructured clinical data like doctors’ notes.

So, granularity covers the precision part. But what about fact-checking your logic and filtering out the irrelevant nonsense? You can probably recall a frustrating conversation in your past with someone who didn’t support their claims, or presented information that didn’t pertain to what you were discussing. When you apply these pitfalls to AI, you’re looking at what data scientists call hallucinations—inaccurate output generated by faulty logic on the part of the language model and how it was trained.

In this post we will cover how advanced AI systems use CNLP to handle the hallucination problem, avoid the propagation of biased output, and enable the transparency needed for tinkering with your AI model so it can steadily improve.

The concerns of LLM hallucinations in clinical settings

As AI continues its rapid ascent, significant concerns have emerged regarding the unacceptable and alarming error rates of LLM-based solutions. These structural errors can lead to poorly informed decisions within the AI’s statistical framework. While some errors may be innocuous, others can be dangerously misleading, especially in clinical and life sciences settings where they masquerade as facts.

A recent study by Stanford University highlights the pervasive nature of hallucinations in AI language models, finding that even advanced models like GPT-4—which features retrieval augmented generation (RAG), a system that’s widely used for medical assessments—cannot reliably back up the claims it makes in the generated output. Specifically, the study found that 30% of individual output statements were unsupported, and nearly half of the responses contained at least one unsupported statement.1 This underscores the critical need for robust methodologies to mitigate such risks in high-stakes domains like healthcare.

Hallucinations arise when the AI model produces contextually incorrect interpretations of syntactically correct text, meaning the language structure is valid but the interpretation is flawed, usually in ways that are opaque because the system leaves no trail for tracing its logic. In the healthcare domain, where consistency, accuracy, and traceability are paramount, this kind of error is highly unwanted.

Applying CNLP for transparency and provenance in patient data

Most LLMs, which unlike CNLP-based systems aren’t trained on domain-specific knowledge, often lack the real-world understanding to grasp semantics as they process information, relying instead on pattern recognition and word associations. Transparency, so key to scientific inquiry, is another casualty of this process. When a healthcare LLM outputs a result saying: “This is the truth as I see it,” we’re at a loss to verify that truth because the LLM does not provide an attribution to the actual source of its findings, nor can we see what aspect of its training the model used to derive the result. In this way, the use of LLMs obviates provenance —the vital ability to quickly view the underlying data supporting a claim, and pinpoint where the specific clinical concept was uncovered.

Tackling these challenges requires robust methodologies grounded in a comprehensive, deeply modeled ontology that is designed to understand and capture semantics, history, negation, and family references within the text it processes. By training CNLP on the extensive SNOMED CT repository, a comprehensive and multilingual clinical healthcare terminology, and incorporating human expertise to review and refine the model, data scientists can base their AI projects on superior modeling and understanding of typical clinical diction, speech, and grammar patterns. This approach also ensures that outputs are attributable to source text material with solid provenance from SNOMED CT post-coordination, thereby avoiding the generation of ‘plausibly implausible’ outputs or other false starts.

Avoiding bias and improving reliability

Another significant advantage of this CNLP-plus- SNOMED CT approach is its ability to avoid bias and the risk of compound hallucinations. When an AI system gets one thing wrong, it can accidentally start perpetuating the error by basing further conclusions on its original faulty one, because they lack internal transparency and instead use their own conclusions as the basis for new “truth.” In contrast, a well-designed CNLP model, with its structured understanding and diligently transparent decision-making process, provides more reliable and verifiable outputs.

Fundamentally, CNLP exists to find and extract meaning from free text, while LLMs generate content based on words and prompts without a structural understanding of the meaning behind the content itself. The limits of training data in LLMs can lead to flawed outputs and hallucinations, which can then propagate via self-generated bias. But AI built on a strong foundation like the SNOMED CT ontology stands a far better chance of bypassing these issues to yield high-value insights.

And just as bias can propagate outward, data and decision quality can self-improve thanks to this inward focus. The internal refinement enabled by using CNLP with SNOMED CT makes your healthcare AI continuously better through meaningful corrections based on a transparent process. When data scientists using a CNLP get feedback on any results, it’s a straightforward process to look at the relevant attribution, check the model against the feedback, and adjust as needed.

At Clinithink, the substantial training we apply to vast quantities of clinical data continually improves our model by ensuring this level of transparency to produce accurate insights and enable efficient growth. We’re committed to harnessing AI responsibly, leveraging its potential while ensuring trust, safety, and ethical integrity. By relying on strong attributions and traceability, we strive to eliminate healthcare data hallucinations while continuously enhancing our AI models.


1 Generating Medical Errors: GenAI and Erroneous Medical References

Learn more about responsible AI with healthcare

Published by Clinithink December 5, 2024