As AI climbed the charts across industries, including healthcare, a new concern arose – bias in the AI models themselves. AI bias stems from the nature of how AI systems are trained, as these systems are only as good as their training and the data behind it. While AI itself is a statistical model and is not aware of its content or output behavior, we humans are aware of ours. And so, we are obliged to use that awareness to understand and appreciate the training any model has received, so we can be sure we’re using the right model for the right purpose.
AI systems rely on contextual data when training their models to recognize conditions and history, so that the model can generate the right output from the input provided. Key to ensuring the appropriateness of this response is making certain it is as free from bias as possible, so that the output is maximized for its usefulness and meaning to the patient and the clinical team.
But the training of AI models is a challenging task, fraught with risks of incomplete or poor-quality data. One factor that can diminish the output data’s value is when the training that produces it contains biased language. To safeguard against this, let’s take a closer look at how effective data modelling works.
The roots of AI bias
A well-trained healthcare or life science AI model uses data to paint a picture of a patient’s situation based on comparisons to many other patients’ histories, conditions, and treatments. This interpolation of a picture is never perfect, but it aims to connect the dots between many data points and draw lines that together create a rough “wireframe” view of the patient so that the shape of their story and the trajectory of their journey are evident at a useful level of refinement. The practical goals of this interpolation are to save each clinician’s time and increase the accuracy of their understanding of where a patient is or was in their journey, and what they are doing to improve their outcomes.
Wireframes, however, typically lack the depth, color, and detail of an actual painting. While a rough outline may look like a car, an airplane, or a building, it may also lack the features that make it real, rather than just an approximate representation. The challenge of errors and hallucinations, as illustrated in our previous blog post, Better facts, better outcomes: How provenance improves reliability in healthcare AI, further exacerbate this issue.
Similarly, building a reliable AI model means recognizing that the training data itself may contain biases that distort the patient picture rather than bringing it into focus. Bias in healthcare and life science AI happens when training data and methods exclude categories of features about certain types of patients due to unhelpful assumptions in the source documentation used for training. Besides exclusions, the training data may also contain other information that is false or misleading.
The authors of a March 2022 National Institute of Standards and Technology paper, Towards a Standard for Identifying and Managing Bias in Artificial Intelligence, describe it this way:
Even when datasets are representative, they may still exhibit entrenched historical and systemic biases, improperly utilize protected attributes, or utilize culturally or contextually unsuitable attributes. Developers sometimes exclude protected attributes, associated with social groups which have historically been discriminated against. However, this does not remedy the problem, since the information can be inadvertently inferred in other ways through proxy or latent variables. Latent variables such as gender can be inferred through browsing history, and race can be inferred through zip code. So models based on such variables can still negatively impact individuals or classes of individuals.
Citation: S. Barocas and A. D. Selbst, “Big Data’s Disparate Impact,” California Law Review, vol. 104, no. 3, pp. 671–732, 2016, publisher: California Law Review, Inc. [Online]. Available: https://www.jstor.org/stable/24758720
Why does this occur? It’s usually the unintentional result of overlooking the detailed relationships within the source documents themselves. Scrutiny of AI source content commonly reveals that supposedly random set of documents drawn from populations of patients may not actually be random at all. Some patient documents will mislead AI training through unexpected variability, a lack of thoroughness, or overall poor documentation quality. These faults in turn can stem from the training of a patient’s clinician and the location where they were treated, along with various patient demographic factors. Other documents may not be clinical at all but instead are drawn from news or social media postings.
This rolls up to present a less valuable set of inputs to an AI model because of inherent bias in the data, resulting in a much less representative and accurate output from the model. And AI bias can produce serious detrimental effects downstream in the treatment process, especially when it causes certain racial, social, gender, or income groups to not receive a quality of care they would otherwise have been afforded. By warping AI model output, bias can negatively affect efficacy, outcomes, and medical decision making throughout a patient’s treatment journey.
AI bias in training models can also include:
Dissolving training assumptions to ensure more equitable care
For LLMs, a model is only as good as the data that goes into it, and the lack of a pure and unbiased medical source documentation inherently hobbles LLM training right from the start. Clinical natural language processing (CNLP) offers a different approach, as described in the blog post that kicked off this series, The truth about AI in healthcare: How CNLPs and LLMs differ in delivering improved outcomes.
Clinithink’s CNLP-based AI model leverages SNOMED CT, a robust clinical ontology, to avoid the bias traps discussed in this post. By focusing on the phrase structure of medical concepts rather than historical internet data or clinician dictations, our model mitigates the issues of biased LLM training. SNOMED CT’s concepts are inherently bias-free, as they reflect universal human biological components—such as the skeletal system, neurological pathways, respiratory functions, cardiac structures, and biomechanical elements—regardless of race, ethnicity, gender, income, or other socio-economic factors. This standards-based approach significantly reduces the risk of bias, thanks to SNOMED CT’s inherently unbiased nature.
Learn more about responsible AI with healthcare