Blog

Hidden dangers: Addressing the scourge of bias in healthcare AI

Written by Clinithink | Dec 12, 2024 6:00:00 PM

As AI climbed the charts across industries, including healthcare, a new concern arose – bias in the AI models themselves. AI bias stems from the nature of how AI systems are trained, as these systems are only as good as their training and the data behind it. While AI itself is a statistical model and is not aware of its content or output behavior, we humans are aware of ours. And so, we are obliged to use that awareness to understand and appreciate the training any model has received, so we can be sure we’re using the right model for the right purpose.

AI systems rely on contextual data when training their models to recognize conditions and history, so that the model can generate the right output from the input provided. Key to ensuring the appropriateness of this response is making certain it is as free from bias as possible, so that the output is maximized for its usefulness and meaning to the patient and the clinical team.

But the training of AI models is a challenging task, fraught with risks of incomplete or poor-quality data. One factor that can diminish the output data’s value is when the training that produces it contains biased language. To safeguard against this, let’s take a closer look at how effective data modelling works.

The roots of AI bias

A well-trained healthcare or life science AI model uses data to paint a picture of a patient’s situation based on comparisons to many other patients’ histories, conditions, and treatments. This interpolation of a picture is never perfect, but it aims to connect the dots between many data points and draw lines that together create a rough “wireframe” view of the patient so that the shape of their story and the trajectory of their journey are evident at a useful level of refinement. The practical goals of this interpolation are to save each clinician’s time and increase the accuracy of their understanding of where a patient is or was in their journey, and what they are doing to improve their outcomes.

Wireframes, however, typically lack the depth, color, and detail of an actual painting. While a rough outline may look like a car, an airplane, or a building, it may also lack the features that make it real, rather than just an approximate representation. The challenge of errors and hallucinations, as illustrated in our previous blog post, Better facts, better outcomes: How provenance improves reliability in healthcare AI, further exacerbate this issue.

Similarly, building a reliable AI model means recognizing that the training data itself may contain biases that distort the patient picture rather than bringing it into focus. Bias in healthcare and life science AI happens when training data and methods exclude categories of features about certain types of patients due to unhelpful assumptions in the source documentation used for training. Besides exclusions, the training data may also contain other information that is false or misleading.

The authors of a March 2022 National Institute of Standards and Technology paper, Towards a Standard for Identifying and Managing Bias in Artificial Intelligence, describe it this way:

Even when datasets are representative, they may still exhibit entrenched historical and systemic biases, improperly utilize protected attributes, or utilize culturally or contextually unsuitable attributes. Developers sometimes exclude protected attributes, associated with social groups which have historically been discriminated against. However, this does not remedy the problem, since the information can be inadvertently inferred in other ways through proxy or latent variables. Latent variables such as gender can be inferred through browsing history, and race can be inferred through zip code. So models based on such variables can still negatively impact individuals or classes of individuals.

Citation: S. Barocas and A. D. Selbst, “Big Data’s Disparate Impact,” California Law Review, vol. 104, no. 3, pp. 671–732, 2016, publisher: California Law Review, Inc. [Online]. Available: https://www.jstor.org/stable/24758720

Why does this occur? It’s usually the unintentional result of overlooking the detailed relationships within the source documents themselves. Scrutiny of AI source content commonly reveals that supposedly random set of documents drawn from populations of patients may not actually be random at all. Some patient documents will mislead AI training through unexpected variability, a lack of thoroughness, or overall poor documentation quality. These faults in turn can stem from the training of a patient’s clinician and the location where they were treated, along with various patient demographic factors. Other documents may not be clinical at all but instead are drawn from news or social media postings.

This rolls up to present a less valuable set of inputs to an AI model because of inherent bias in the data, resulting in a much less representative and accurate output from the model. And AI bias can produce serious detrimental effects downstream in the treatment process, especially when it causes certain racial, social, gender, or income groups to not receive a quality of care they would otherwise have been afforded. By warping AI model output, bias can negatively affect efficacy, outcomes, and medical decision making throughout a patient’s treatment journey.

AI bias in training models can also include:

  • Authoring bias. The providers or other professionals writing the source documents for healthcare AI training can also introduce bias, often without meaning to do so. This has an immediate downstream effect on LLM training, affecting the reliability and quality of models. The bias is baked into the model by being introduced early in the process, based on the training data that was selected. The fact that much LLM training data is drawn from readily available internet content further adds to risk of bias, as the data likely contains non-medical as well as medical information, and can include highly subjective biases around race, income, geography, and politics.
  • Shortcut bias. Another type of training bias that can undercut the value of AI occurs when the model trainer opts for quick interpolations based on data summaries (or worse, summaries of data summaries) in lieu of using the deeper knowledge required to make accurate interpretations. For example, some AI systems are trained on data sets like medical test questions, rather than medical texts themselves, in an attempt to get directly to a quick answer but in fact preventing the model from including the root knowledge that supports true medical understanding. In trying to get to the quickest answer, interpretation is replaced with an interpolation, resulting in insufficient, inaccurate, and sometimes completely misleading and false results. This kind of error and bias, when combined, can lead to shallow, formulaic answers that give the appearance that the system suffered from “teaching to the test.” And that is literally what occurred—you’ve limited the model’s capabilities by training it with knowledge that can only be interpolated from the reduced set training data it received.

Dissolving training assumptions to ensure more equitable care

For LLMs, a model is only as good as the data that goes into it, and the lack of a pure and unbiased medical source documentation inherently hobbles LLM training right from the start. Clinical natural language processing (CNLP) offers a different approach, as described in the blog post that kicked off this series, The truth about AI in healthcare: How CNLPs and LLMs differ in delivering improved outcomes.

Clinithink’s CNLP-based AI model leverages SNOMED CT, a robust clinical ontology, to avoid the bias traps discussed in this post. By focusing on the phrase structure of medical concepts rather than historical internet data or clinician dictations, our model mitigates the issues of biased LLM training. SNOMED CT’s concepts are inherently bias-free, as they reflect universal human biological components—such as the skeletal system, neurological pathways, respiratory functions, cardiac structures, and biomechanical elements—regardless of race, ethnicity, gender, income, or other socio-economic factors. This standards-based approach significantly reduces the risk of bias, thanks to SNOMED CT’s inherently unbiased nature.

Learn more about responsible AI with healthcare