Moving to the Forefront of Clinical Research – The Newcastle Story

In this film the research team at Newcastle upon Tyne Hospitals NHS Foundation Trust (one of the UK’s leading clinical research centres) explains the impact Clinithink’s CLiX technology has had on the site’s recruitment levels and commercial activity, as well as offering insight into further value added for patients and staff too.

Clinithink’s journey into AI

Sarah Beeby, EVP, explains how it all began

Sarah took part in a session at the World Medical Innovation Forum (WMIF) to explain how Clinithink’s NLP technology has impacted the pharmaceutical and diagnostic realms. The journey is very interesting as the company was started by two disenchanted doctors who wanted to do more with the data they had.

In her talk she explains why this came about and some of the areas where technology can aid insight. She covers some of the cases where it is assisting to match drugs to patients and provide better healthcare.

It’s all about speed, insights, and ultimately better healthcare.

View Sarah’s video from WMIF below:

The company you keep…

The Bio IT World Conference & Expo has a long standing reputation as being the premier event for showcasing IT and informatics applications and enabling technologies that drive biomedical research, drug discovery & development and clinical and healthcare initiatives.

We are very excited about one of the featured sessions taking place on Wednesday 13th March at 11.40am. The session 2018Diagnosing Rare Disease Patients: Progress in Fully Automated Diagnosis’ is being delivered by Tom Defay Senior Director, R&D Strategy and Alliances, SPMD, Strategy, Program Management and Data Sciences, Alexion.

You may remember we were part of a recent collaboration, that also included Alexion Pharma, which saw the group achieve a GUINNESS WORLD RECORDS title for the ‘fastest genetic diagnosis’. The project successfully compressed the time needed to diagnose rare genetic disorders in newborns through DNA sequencing to 19.5 hours, setting a new GUINNESS WORLD RECORDS title in a seamless end-to-end process.

Our patented CLiX natural language processing (NLP) solution was needed for two key activities in the process: quickly combing through electronic medical records to extract crucial phenotype information to then compare with over 12,000 phenotypes (plus an additional 15,000 synonyms) describing the characteristics of thousands of rare diseases. The solution can perform both of these tasks in seconds compared to the hours or days it would take a highly skilled specialist physician to do. We understand that Tom will be referencing this project in his session.

It was a great project that we were incredibly proud to be part of as, not only did this exciting project prove the value of our technology, the humbling reality is that this work can help save the lives of children with rare diseases.

Clinithink receives RARE Champion of Hope award for their ‘notable efforts’ in rare disease

Clinithink has been selected to receive the RARE Champion of Hope – Collaborations in Science and Technology award at the 7th Annual RARE Patient Advocacy Summit on October 4th in Irvine, California.


Over 200 individuals and organizations worldwide were nominated by their peers for a RARE Champion of Hope award for their notable efforts in rare disease advocacy, teen advocacy, science, medical care and treatment and collaborations.


Along with their collaborators Rady Children’s Institute for Genomic Medicine and Alexion Pharmaceuticals, Clinithink was selected to receive the award because of the work which resulted in the successful compression of the time needed to diagnose rare genetic disorders in newborns through DNA sequencing to 19.5 hours.


Clinithink’s CLiX solution has been shown to dramatically accelerate the diagnosis of rare disease. This was evidenced in the Rady-Alexion-Clinithink collaboration, for which the RARE Champion of Hope has been awarded, where Clinithink’s NLP solution was used for two key activities in the process:

  1. quickly combing through electronic medical records to extract crucial phenotype information
  2. comparing the extracted phenotype with over 12,000 phenotypes (plus an additional 15,000 synonyms) describing the characteristics of thousands of rare diseases.


The Clinithink solution can perform both of these tasks in seconds compared to the hours or days it would take a highly skilled specialist physician to do.


The diagnosis of rare disease is incredibly challenging and, if left undiagnosed these conditions are very expensive to manage and, often, life limiting. Genomics data has held great promise and, increasingly, evidence is suggesting that phenotype is at least as important as genotype in diagnosis and the development of new treatments. However, in order to be able to unlock the true value of such data AI tools, such as Clinithink’s patented CLiX Natural language processing (NLP) solution is essential.


Sarah Beeby SVP, Life Sciences said: “Our work with Rady and Alexion has enabled us to evidence the value of our technology and, more importantly, be involved in projects that can help save the lives of those with rare diseases.


“Narrative data is a valuable asset in healthcare but is largely inaccessible due to its lack of structure as existing technologies rely on structured data or key word searches. Our CLiX technology can ‘read’ thousands of clinical documents an hour and extract the information they contain. This has significant benefit in the clinical setting and is also very powerful for clinical trials as it can optimize the development process along the entire continuum.”


Clinithink CEO, Dr Chris Tackaberry said: “We are absolutely delighted to be recognised together with our collaborators in the RARE Champion of Hope awards and very much look forward to meeting colleagues and peers involved in this exciting space at the RARE Patient Advocacy Summit in October.”



Sarah Beeby joins Clinithink to champion its life sciences business

Sarah Beeby has joined Clinithink as Senior Vice President of Life Sciences where she will be responsible for introducing new technology, in particular AI, to Lifescience projects.

Sarah has over two decades of experience in the global life sciences space and brings a wealth of experience in life sciences having worked in a range of diverse roles. Sarah also has a wide range of operational, therapeutic and regulatory expertise with a focus on collaborative working opportunities to enhance delivery and patient experience. Sarah’s experience as a management team member at exec and board level has covered investment, strategy and operational efficiency and delivery of products and services.


Prior to joining Clinithink Sarah was MD of Synexus and, utilised technology and collaborative working relationships to enhance delivery, profitability and the patient experience in research. She was also pivotal in negotiating a contract between Synexus and the Northumbria Healthcare NHS Foundation Trust. Together they established the North East Clinical Research Centre at Hexham General Hospital; a particular exciting partnership as it was the first of its kind between a private company and the NHS.

Discussing the new role Sarah said: “We hear a lot about the potential of technology and how disruptive technology can bring about a host of benefits. Often, however, there is a disconnect between what is possible and what the market needs. I am really excited to join Clinithink as they are at the forefront of what is possible. Recently our software helped achieve a GUINNESS WORLD RECORDS title in partnership with the Rady Institute for Genomic Medicine for the ‘Fastest genetic diagnosis’ by successfully compressing the time needed to diagnose rare genetic disorders in newborns through DNA sequencing. Not only did this exciting project prove the value of our technology, the humbling reality is that this work can help save the lives of children with rare diseases. That really isn’t a bad day at the office.”

Clinithink’s patented CLiX natural language processing (NLP) solution was needed for two key activities in the process: quickly combing through electronic medical records to extract crucial phenotype information to then compare with over 12,000 phenotypes (plus an additional 15,000 synonyms) describing the characteristics of thousands of rare diseases. The solution can perform both of these tasks in seconds compared to the hours or days it would take a highly skilled specialist physician to do.

Narrative data is a valuable asset in healthcare but is largely inaccessible due to its lack of structure as existing technologies rely on structured data or key word searches. CLiX can ‘read’ thousands of clinical documents an hour and extract the information they contain. This has significant benefit, as shown with the Rady collaboration, in the clinical setting and is also very powerful for clinical trials as it can optimize the development process along the entire continuum – site selection, protocol optimization and enrolment. In simple terms our software can automate the bulk of pre-screening and site feasibility which is still undertaken manually. By targeting very specific patient inclusion/exclusion criteria, even down to specific start and stop events, varying dose levels and thousands of other variable criteria software can be used to speed up clinical trials.

Sarah Beeby, continued: “As well as being an incredibly empowering project to be involved with it also enabled us to put CLiX through its paces and be even more confident in its ability. By using CLiX in the clinical trial setting we can offer significant de-risking for the program and potentially significant savings in development time by decreasing the total enrolment time, reducing protocol amendments, improving data robustness for the next development stage. As we all know time is a big factor with a significant number of trials failing to meet enrolment timelines.”

Clinithink CEO, Chris Tackaberry said: “Sarah has huge experience in the life science industry and we are really excited that she has chosen to join Clinithink as the executive leading our life science business. We are confident we can make a real difference in this space and have already seen exciting traction since Sarah has joined us.”




Clinithink plays key role in achieving new world record for rapid diagnosis of rare genetic disorders

It’s not often in healthcare technology that we see a dramatic and immediate impact on a clinical outcome that results directly from the use of software. Often the benefits we ‘IT folks’ deliver are subtle and long term, reducing cost and improving efficiency. While these are worthwhile and create value, those of us working in the field dream of creating software that will deliver a significant and direct patient benefit. I am extremely proud to report that, through our collaboration with Rady Children’s Institute for Genomic Medicine (RCIGM), San Diego, CA and its partners we have realised that dream.

In this project we worked with RCIGM, one of the world’s leading genomics institutes, along with colleagues at Alexion, Illumina and a number of other technology innovators in the genomics space. As part of a seamless end-to-end process, the project successfully compressed the time needed to diagnose rare genetic disorders in newborns through DNA sequencing to less than a day, 19.5 hours to be precise, setting a new GUINNESS WORLD RECORDS® title.

This is an astonishing achievement. But to the children, their parents and the staff looking after them, this speed really, really matters because of the huge potential to improve the outcome made possible by faster diagnosis.

At Clinithink we are absolutely delighted that our patented CLiX natural language processing (NLP) solution played a key role in this success. CLiX was needed for two key activities in the process: quickly combing through electronic medical records to extract crucial phenotype information to then compare with over 12,000 phenotypes (plus an additional 15,000 synonyms) describing the characteristics of thousands of rare diseases. The solution can perform both of these tasks in seconds compared to the hours or days it would take a highly skilled specialist physician to do. Not only did this exciting project prove the value of our technology, the humbling reality is that this pioneering work can help save the lives of children with rare diseases.

If you are unfamiliar with this space (and I know I was) have a look at this short video to understand why this is so important and read this press release from our partners.


Clinithink: see the wonder in the detail

Patient Recruitment & Technology: Silver Bullet or Good as Gold?

More research stakeholders are incorporating technology in some form to their patient recruitment strategies to help teams meet accrual targets. And while automating this largely manual process has made a positive impact, it hasn’t solved the long standing problem of falling significantly short of meeting recruitment targets.

In a survey published by Applied Clinical Trials entitled: Barriers to Clinical Trial Recruitment and Possible Solutions: A Stakeholder Survey, the authors gathered insights from clinical trial sponsors, research sites and patient advocacy groups to understand why so many clinical trials fail to recruit enough eligible patients and what can be done about it.

The survey found that finding eligible patients remains the number one hurdle to recruitment. Additionally, the use of technology to search medical records, registries, or databases to find suitable patients has resulted in varying degrees of success. But respondents were cautious to attribute silver bullet status to the use of technology to find patients, stressing the importance of ‘thoughtful site selection, feasibility testing, and development of recruitment strategies with realistic timelines and goals’ as other significant barriers to strategic recruitment.

Like Gold Dust

It seems logical for any recruitment strategy to use technology to search medical records to identify patients, but what if it’s only half of what technology could be doing? Most technologies will only search structured data within EMRs, hospital-based registries and other databases – essentially, billing codes already considered unreliable for providing any clinical value. So the results of those searches are variable and don’t alleviate enough of the time and cost involved in screening identified patients.

What you want is to search all of the data stored in patient records, particularly the unstructured clinical narrative notes that gives you the whole patient picture and contains the majority of the information you need to determine a patient’s eligibility. This is only possible using Clinical Natural Language Processing (CNLP) technology – a technology that preserves the context of a doctor’s notes about a patient and adds important detail to a patient’s history and health.

Clinithink’s CLiX ENRICH software solution maximizes the unique benefits of CNLP to give sites access to the valuable information stored as clinical narrative. Case studies from sites that use CLiX ENRICH have reported that they’re able to search 100,000’s of patient documents in a matter of hours to find 10X more patients than currently possible. Also, CLiX ENRICH allows sites to input trial-specific inclusion and exclusion criteria to return a list of patients, ranked in order of eligibility, to drastically accelerate patient screening and enrollment. This is a significant step in the use of technology to have a measureable impact on patient recruitment.

The Silver Lining

‘Automate to accelerate’ will soon become primary vetting criteria for solutions that are expected to improve patient recruitment but what could this mean for patient recruitment in the real world? Consider the paradigm shift across the entire clinical trials industry if, at the click of a button, sites could generate a pre-screened list of eligible patients in their patient population, thereby negating Lasagne’s Law and enabling them to use their data to know what trials to take on. This is a game changer for feasibility for the site and site selection for the sponsor or CRO.

Similarly, patient recruitment strategies will pivot to focus less on the barriers to enrollment to concentrate research team time on patient education and retention activities, strengthening relationships with referring clinicians and taking on more trials with the same number of staff. By implementing CLiX ENRICH as an integral part of a recruitment strategy, sites are able to solve insufficient enrollment and numerous associated challenges, such as:

  • Streamlining and accelerating the screening process
  • Improved study planning
  • Decrease in time spent by research team on finding patients
  • Faster, more cost effective chart reviews
  • Decreasing the number of screen failures
  • Feasibility based on data

It’s true that no one solution will solve all the complex challenges of patient recruitment for clinical trials. However, innovative technology solutions can solve numerous challenges with benefits across your entire clinical trial enterprise. Do you want to know more about how you can use CLiX ENRICH to help you reach your trial targets? Contact us for a free demo.

Waiver of HIPAA Authorization: Meeting IRB Requirements

IRB - ClinithinkProtecting the patients who participate in clinical trials is a legal requirement and a top priority for owners of e-health databases. Health systems, academic medical centers and other healthcare providers are looking for ways to make compliance with HIPAA and the Common Rule easier, while at the same time harnessing the e-health information in their databases to identify potential clinical trial participants.

As primary protectors of patient safety and privacy, IRBs play a key role in being able to meet these two objectives but getting a waiver of Authorization of the Rule’s informed consent requirements means proving that there is a low level of risk that patients’ privacy would be breached. However, most owners of e-health databases don’t have the in-house technical tools to quickly and easily scan patient information. More often than not, they require the implementation of a third party solution. This is typically considered another layer of complexity to leveraging health data, particularly with the advent of accessing and using unstructured clinical narrative. However, it really isn’t that complex. Think of it as just having a computer automate the pre-screening of patient records instead of having humans manually review. In addition, for many years now, sites have been using structured or discreet data sets to search for patients meeting clinical trial criteria, this information includes Protected Health Information (PHI).

CLiX ENRICH is a software application that allows organizations to search and interrogate their e-health databases to identify eligible clinical trial participants in a manner that is within the legal framework of both the HIPAA Privacy Rule and the Common Rule; and gives IRB members the confidence that PHI is safeguarded and simplifies the approval process of waivers.

Privacy as a priority

Simply put, CLiX ENRICH is installed in the organizations own environment and stays there. Unlike other third-party solutions, absolutely no e-health data is shared with the owners of CLiX ENRICH or any third party whatsoever.

Think of a health system’s e-health database as a library and its employees in charge of data management as its librarian. CLiX ENRICH empowers the librarian to identify potential participants without checking data out of the library and without inviting anyone into the library. The organization keeps its data in its library and most importantly, identifiable private patient information never leaves the site.

This in itself negates the risk associated with hiring an outside party or outsourcing the analysis of health information stored in the site’s databases or medical records. But before the clinical use of Natural Language Processing (NLP) technology to review health data, accessing data by a third party was unavoidable because health systems alone lacked the ability to identify potential subjects from the library’s private patient information.

Empowering stakeholders safely

Any time an outsider is given access to a patient’s records there is risk of inappropriate disclosure. In these situations IRBs were reluctant to grant waivers of the informed consent requirement to allow the researcher to review health information without patient consent.

Because CLiX ENRICH alleviates the need for an outsider and no identifiable private information is provided to Clinithink, patient privacy and accountability for that privacy is much more seamless and smart to an IRB processing a request for a waiver.

CLiX ENRICH is a self-use, HIPAA compliant tool that automates the cumbersome, time consuming task of searching databases and medical records for potential patients to recruit into clinical trials. Stakeholders of CLiX ENRICH, such as PI’s, research coordinators and others; use it as a workflow tool that automates the pre-screening process to generate a list of qualified potential participants against trial-specific inclusion and exclusion criteria – no other processes are changed.

Compliance before approval

In generating the pre-screened list of patients, stakeholders tell CLiX ENRICH what to find; for example, ‘women age 45 and older who have co-morbidities involving hypertension and diabetes’.

CLiX ENRICH takes those criteria- or queries- and searches the e-health information in the library that meet the parameters and produces a list of possible study participants that can be de-identified or used as is.

Once the list of eligible patients is produced, users with an authorized username are able to access the list to begin contacting the patients to request study participation. Healthcare providers are able to use the tool to dramatically accelerate pre-screening and enrollment. Because Clinithink understands and respects the requirements of HIPAA, the Common Rule and the role of the IRB in the process of identifying patients for clinical trials, it provides the following language to include in IRB meeting minutes to document waiver approval:

“…finding no or minimal risk of exposure of patient’s health information because the information never leaves the [principal investigator’s entity] and because no individual from outside the [principal investigator’s entity] workforce ever has access to PHI and because the scope of the waiver is only as to the preparatory – to – research/subject identification process. Once identified, the patients’ consent will be obtained before they are enrolled in the study, as normal.”

Let us help you get approval from your IRB and find patients to enroll 10X faster. Email now.

CNLP Technology: Build or Buy?

build or buy

Long gone are the days when IT departments consisted of few staff members whose role was a mystery to most and spent most of their time answering the expected why-doesn’t-it-work? questions from colleagues. These days, IT is integrated into almost every process in just about every kind of organization. IT departments are now sophisticated, made up of highly skilled professionals and more often than not, very busy.

Healthcare is no different and IT is lauded as the gatekeeper to better quality, more affordable healthcare that is accessible to more people. It comes as no surprise that when vetting new innovative technologies, the inevitable question is whether to buy it off-the-shelf or build it in-house. Clinical Natural Language Processing (CNLP) is one technology that falls into that category because it’s applications in healthcare are endless and its impact is powerful so why wouldn’t organizations want to build it themselves?

DIY or don’t?

The basic checklist for whether or not to build CNLP technology include a rare range of knowledge and skills that span:

  • the practice of clinical medicine;
  • artificial intelligence algorithms;
  • linguistics; and
  • software engineering.

Even if an organization has these skills in theory or in isolation of each other, developing CNLP algorithms that read and interpret clinicians’ notes requires detecting and structuring complex medical phrases using both clinical and linguistic knowledge, and an understanding of a relevant terminology or ontology used for structuring CNLP output into standardized formats. Building such a system takes years of application of expert knowledge (typically doctoral-level), the curation of large volumes of training data, and lots of trial and error. Building it into a highly available, high performance, accessible product multiplies the effort considerably. This discovery often leaves organizations looking for alternatives to building it entirely themselves, often in the form of Open Source.

Keeping your options Open

There are numerous open source NLP toolkits available that may be a viable alternative to developing a solution from scratch. The fact that they are ‘free’ adds to the appeal considerably but the fact is that what you save in license fees, you’ll spend on developer time and resources with no guarantee that you’ll get it right.

The downside to Open Source NLP is that it’s not developed continually, it’s not documented, there’s no roadmap and there’s no support. In addition, NLP toolkits have very few applications or solutions and even fewer clinically orientated applications. Even if you do find one it’s most likely to be limited in its application or ‘special case’ over multipurpose, and will require:

  • training data;
  • considerable software engineering (don’t underestimate this one! A conservative estimate is 50%+ of developer time);
  • time investment; and
  • testing and development to address deficiencies in performance and interoperability.

Taking this route essentially means building up a team of professionals, clinical, technical and business, to make sense of these newly discovered NLP capabilities. Together, you’ll need to work out: querying; analytics via querying across documents and integration. And while open source might be attractive in its potential, it also opens up a series of risks (including security of patient data) and investment that can be avoided completely by choosing the right solution partner.  And, the overall time investment delays the benefits that the decision for CNLP can realize for the institution.

Easy does it

Perhaps a deciding factor when weighing up whether to build or buy a CNLP technology is how soon you want and need to access your unstructured clinical data to help solve complex healthcare challenges. If it’s sooner rather than later, then choose a solution that meets the following requirements:

  • HIPAA compliance;
  • Encryption for communication and any stored PHI; and
  • De-identified data.

Clinithink’s CLiX ENRICH is a best-of-breed CNLP technology that eliminates the risks associated with building in-house or open source. Choosing CLiX ENRICH means:

  • Lower Total Cost of Ownership: It’s unavoidable that the cost in terms of development time, and continual maintenance and improvement is much higher than purchasing a complete solution. When calculating the overall costs, it’s imperative to take a long-term view to understand affordability and in most instances, choosing a strategic CNLP partner is better value for money.
  • Immediate benefits realization: With CLiX ENRICH you’re able to tackle business and clinical problems right from the start rather than losing years to development time and stalling any improvements you want to make. On the other hand, open source projects are often believed to be ‘free’ but usually have complex dependencies with varying license terms. Even if there is an option that makes economic sense in the short term, the initial savings will be spent on developer time and resources.
  • Support: CLiX ENRICH users have access to professional support 24/7. Opting to go it alone is a heavy burden that can be avoided by purchasing an off-the-shelf solution.
  • Updates: Open source CNLP projects are largely academic, and tend to stagnate in the absence of ongoing, related research. A purchased solution provides a reliable schedule of content and feature updates that are in response to user needs and market demands.

Moreover, choosing CLiX ENRICH affords you benefits that make a marked difference when you are trying to use unstructured data to solve business and clinical challenges. With CLiX ENRICH:

  • You don’t need to clean your data before you process it;
  • You can customize queries specific to your environment and set of challenges;
  • Output can be mapped to a format of your choice;
  • CLiX ENRICH is easy to install and doesn’t compromise patient privacy or data security.

Want to know more about how to avoid the pitfalls of building or using open source CNLP tech? Contact us at

About the Author

Jack Kowitt has been a recognized IT CIO leader who career has been close to the leading edge of bringing technology to improving healthcare operation, both clinical and financial, and services to patients from the earliest EHR implementations through current data analytics and natural language tools. He has held leadership roles at SUNY, Mt. Sinai (NY), Samaritan (Banner) and Parkland (Dallas).

Negating Lasagna’s Law


You know what it’s like. You take on a clinical trial convinced that you have enough of the right patients to meet recruitment targets. But you haven’t gotten very far down the funnel when you realize that you may have greatly overestimated the cohort of patients that meet trial criteria. Its common practice for those involved in finding patients for clinical trials and research to rely on their impression of viability within their patient population; but as in/exclusion criteria gets narrower and narrower, there’s an opportunity to rely on your data to determine the likelihood of success in recruitment instead.

A recipe for disaster

Lasagna’s Law or “the incidence of patient availability sharply decreases when a clinical trial begins and returns to its original level as soon as the trial is completed,” is a phenomenon that’s been around since the 1970’s. In the last few years, we’ve seen some innovation in the recruitment space but those solutions still rely heavily on manual processes that include clinician/investigator intervention and time.

It’s estimated that over 70% of the patients that eventually enroll in clinical trials are within an investigative sites’ existing patient population and yet almost 40% of sites under-recruit and more than 10% of sites fail to recruit any patients at all. So perhaps the problem isn’t entirely about a sheer lack of eligible patients, but accessing the data you need to filter patients against trial criteria.

Say you need to enroll 100 female patients between the ages of 40 and 65 into a diabetes study. Your structured data can offer you a top-of-the-funnel idea of how close you can get to recruiting 100 patients. But you also need to know: BMI, blood glucose levels, history of cardiac conditions and incidences of corticosteroid use to know how many of the patients you’ve identified meet trial criteria. This information is stored as unstructured data or clinical narrative and when you’re able to access it, can give you the whole picture of a patient and therefore, their eligibility for a trial. But again, you’re relying on clinician time and resource to manually review copious amounts of unstructured patient records – a strategy that doesn’t get you closer to recruitment targets fast enough.

Fortunately, there’s a paradigm shift in patient recruitment. What if you had meaningful, actionable insights into your patient population at your fingertips that you could use to make informed decisions about whether to take on a study and whether you could meet enrollment targets within the stipulated timeline and budget?

Hindsight versus foresight

It’s often said that “the best time to plan a controlled trial is after the trial has finished” because all the questions you need to be able to answer before starting a trial have already been answered. But at that stage, you’ve already missed deadlines, possibly gone back to the Sponsor to ask for amendments to in/exclusion criteria, gone over budget and/or exhausted clinician resources.  What you really need is to be able to answer feasibility questions with concrete data before the trial starts.

Clinithink’s CLiX ENRICH is a game changer in clinical trials because it automates the search and pre-screening of patients against trial-specific criteria. That means you can process millions of unstructured patient records in hours and be left with a list of potential patients to enroll ranked in order of eligibility.

Case studies have shown that using CLiX ENRICH to truly automate the pre-screening stage has yielded 10X the amount of quality, eligible patients in ¼ of the time it takes to do so manually. This marked reduction in time, manual effort, errors and educated guessing can only have a positive impact your clinical trial enterprise.

Study feasibility is a pretty ambiguous term that’s used widely but understood differently from person to person. One way to set benchmarks in determining feasibility is to use data – structured and unstructured – to offer a more precise approach than educated guessing. Enlisting CLiX ENRICH to determine recruitment feasibility is only the start of broader data-driven study feasibility that gives investigators the information they need to best use their judgement, while leaving the information mining to a tried and tested technology.

Download the White paper or contact us to find out more.

Sheryl Lowenhar, MBA, RPh, is Vice President of Sales and Marketing for Clinithink

3 Technologies Changing Clinical Trials – Notes from SCOPE 2017

20170220-Blog-ImageMost of us have experienced firsthand how technologies like Mobile, Big Data, Cloud and Social have transformed Finance, Retail and Manufacturing. But the impact of these and associated technologies on Healthcare has been more like an evolution than a revolution while stakeholders figure out how to leverage tech to solve complex challenges and derive tangible value from them. But the clinical trials landscape is changing, or evolving – albeit slowly.  At this year’s annual SCOPE Summit, the largest gathering of clinical operations executives, it was obvious that innovation in clinical trials is gaining momentum to address specific pain points in how clinical trials are conducted.

If you attended the Summit you would have witnessed hundreds of vendors offering very niche solutions, along with well-known big industry players and CROs.  So we know there are lots of pieces to the puzzle but the bigger challenge hasn’t changed much: how to get new drugs to market faster in order to help more people.

Sponsors are continually looking for ways to bring efficiency into processes while maintaining patient safety and keeping them at the center of innovation.  Clearly, one of the biggest challenges is patient recruitment. I’ve listed the top three technologies getting the most buzz at SCOPE  and that I believe are the drivers for a paradigm shift in patient recruitment.


This initiative by Transcelerate harnesses consumer-driven tech adoption by digitalizing consent for clinical trials.  eConsent empowers patients to make a truly informed decision about whether to participate in a study by giving them easy-to-understand clinical trial information. The tool, available via web or mobile, also streamlines the consent process for sites by reducing time-consuming explanations, paperwork and dropout rates. Sponsors benefit because of reduced on-site consent monitoring and corrective action based on consent audits. I believe this tool is increasing patient engagement in a way we haven’t seen before and will become commonplace in clinical trials to come.


Natural Language Processing (NLP) isn’t new but using NLP to find patients for clinical trials is. Using technology to automate and accelerate the search and pre-screening of patients, is a leap forward in improving efficiencies in the patient recruitment process at research sites.  We’re doing some ground-breaking work at Clinithink in this area.  Together with the Icahn School of Medicine at Mount Sinai, Steve Coca, DO, presented results from two case studies proving this method finds 10X the amount of eligible patients  in a quarter of the time it takes traditionally.  Think about the difference in timelines and manual effort if sites were using Clinithink’s CLiX ENRICH tool.


Given the explosion of wearable health monitors, there’s a good chance you’re wearing one right now. For the clinical trials industry, wearables present a great opportunity to identify potential patients and gather enormous volumes of data spanning over an individual’s lifespan. This is consumer-driven tech at its finest with the majority of users wearing devices 24/7 to capture vital data as a byproduct of carrying out daily life. Multiply that by millions of consumers, and patients, and we’re left with a valuable way to collect data remotely and cost effectively.

The Next Frontier

Alexa, how many patients do we have?  As we automate our lives, can we automate more areas within the clinical trials environment?

One particular challenge that came up numerous times throughout the Summit across various conversations is the accuracy of feasibility analyses. While there seems to be progress in terms of making better use of EMR data, statistical modeling and knowledge of past site performance, there is still room for improvement and innovation.

I’ll be watching developments in this space very closely in the coming year or two. I suspect that as technologies are leveraged to solve specific pain points in clinical trials, the opportunity to use them and others to refine feasibility accuracy isn’t far behind.  Could NLP help?

Sheryl Lowenhar, MBA, RPh, is Vice President of Sales and Marketing for Clinithink

Barbara E. Bierer, M.D., Joins the Clinithink Board

We sat down with Dr Barbara Bierer to discuss her experiences in clinical trials, Clinical Natural Language Processing and why she decided to accept a Board position at Clinithink.

Barbara-E-Bierer-MD-Blog-ClinithinkBarbara E. Bierer, M.D., a hematologist-oncologist, is a Professor of Medicine at Harvard Medical School and the Brigham and Women’s Hospital. Dr Bierer co-founded and now leads the Multi-Regional Clinical Trials Center at Harvard and the Brigham and Women’s Hospital (MRCT Center), a collaborative effort to improve standards for the planning and conduct of international clinical trials with a particular focus in the developing world. In this capacity, she works with regulators around the world (USFDA, EMA, CFDA, CDSCO, and others), major pharmaceutical companies, the biotech industry, clinical research organizations, academia and patients/patient advocates to harmonize policies for and approaches to clinical trial conduct and regulation. Read more


Q: How did you first become interested in Clinical Natural Language Processing?

A: Clinical Natural Language Processing (CNLP) has been around for a long time. I was introduced to CNLP over a decade ago at Harvard and it was apparent to me that the technology would have a transformational impact on clinical trials as well as clinical care.

Back in 2004, the NIH was interested in creating an informatics framework to understand complex genetic disease through examining large patient data sets. They funded a center, named i2b2 (Informatics for Integrating Biology & the Bedside), at the Laboratory of Computer Science (LCS) at Harvard Medical School, as Harvard was already considered a pioneer and leader in the area of healthcare informatics and clinical systems. Initially, the primary mission of the Center, among other endeavors, was to build an open-source platform that would analyze and extract content found in physician notes, discharge summaries and other clinical documents. Over 100 national and international research organizations currently use the i2b2 platform to parse data from their own internal clinical data sets and to collaborate with others.

Q: How was CNLP first used in clinical trials?

A: As I remember, an early application of CNLP was to understand patient smoking status. There are many ways to describe smoking habits and, at the time, none were captured in structured data elements in electronic health records. For example, a physician might record in clinical notes that a ‘patient never smoked’, ‘patient smoked two packs a day from 1995-1998 and abruptly stopped’, or ‘occasional tobacco use over past 20 years’. While we have used EHRs for clinical trial screening for years, this kind of information, only stored in text documents and notes, were not easily available to the investigator. In addition, there are relevant and informative data stored in PDFs such as radiology reports that are not part of EHRs. To code these individual medical records means reviewing and coding individual charts, one by one, by a study nurse or other individual in order to do outcomes research or to find patients that would be good candidates for clinical trials. The ability to use CNLP was very helpful in making this data available to investigators—and in saving time and expense for the study team.

“[Clinithink] has an incredible opportunity to use its technology to change the way we find and recruit patients for clinical trials…”

Q: When were you first introduced to Clinithink?

A: I first met Dr Chris Tackaberry, co-founder and CEO of Clinithink a couple of years ago. When he described CLiX ENRICH technology, I realized how enabling this would be for finding eligible patients for clinical trial recruitment purposes. If the technology performed as described, it would not only shorten the time to identification but also increase the number of potential participants substantially.

Q: Why did you accept a board position?

A: It was evident to me, from my first conversations with Chris, that the company has an incredible opportunity to use its technology to change the way we find and recruit patients for clinical trials. I knew from my prior experience that CNLP can be powerful – but it is not easy to do well. I believe that Clinithink is far ahead of any groups working in this area. As a board member, I hope to be able to contribute to the future strategic direction of the company and its role in the clinical trial recruitment enterprise. It is an exciting time for me personally to join the Board and, importantly, an exciting time for the company.

Maintaining patient data security and privacy at Mount Sinai with CLiX ENRICH [Video]

Maintaining the security and privacy of patient information is paramount to conducting clinical research and often times a prohibitive factor to deploying third party solutions.

Forerunners in the use of automation to find eligible patients to enroll in clinical trials highlight the advantage of using CLiX ENRICH for Clinical Trials whereby they can process clinical narrative from EMR data that is full of references to Personal Health Information (PHI) to extract relevant de-identified structured data.

In this video blog, we talked to Dr. Steven Coca, Dr. Girish Nadkarni and Stephen Ellis from Mount Sinai about how CLiX ENRICH for Clinical Trials enables the use of rich patient data while maintaining the security and privacy of patient information.

Get the Whole Patient Picture


ICD-10, or the International Classification of Diseases 10th Revision, is used to collect morbidity and mortality information for populations all over the world. As a late adopter of the classification, the United States has its own specialized version of this clinical coding system, ICD-10-CM/PCS. The U.S. also uses this classification, as do many other countries around the world, for billing purposes.

The way it works is that physicians take down the patients’ stories in the medical records, in a format called unstructured clinical narrative, and then that information is condensed into a few lines of structured information, the ICD codes. This ends up being just the most germane conditions of the patient, and usually only those that will be relevant for reimbursement of services. Very few providers actually submit more ICD codes than necessary when translating the unstructured narrative into structured codes. Providers can actually be penalized if they do apply more ICD codes than necessary as this can affect the reimbursement they receive. This makes the structured data an incomplete, and therefore, inaccurate picture of the patient as a whole.

Take note

Traditionally, clinical coders or the practitioners themselves will condense that narrative into the codes. When a clinical coder conducts the process, a manual review of the chart is necessary to pick out the pertinent aspects of the patient story, and translate them into ICD codes. This manual review process means coders are often alternating between many different documents which increases the likelihood of human error.

Even with 68,000 different ICD-10-CM codes demarcating diseases and health conditions, not all are represented, and even fewer are captured completely. The massive expansion in the number of ICD-10-CM codes, compared with just 13,000 in ICD-9-CM, is the result of an attempt to introduce more granularity (more detail) into what can be captured as structured data. Because of the way that ICD-10 is designed to be used, this necessarily means listing all of the permutations of additional context and giving each one a unique code. Even so, there are still aspects of conditions that impact clinical decision making which cannot be represented within ICD-10. These are temporal context (‘history of’ or ‘present’ conditions), severity and acuity, laterality, anatomical representation (‘distal’, ‘anterior’), frequency (‘nightly’, ‘every six hours’, ‘once or twice a week’), just to name a few. An example of a non-specific, ICD-10 code that is meant to capture any of the myriad of digestive system diseases and conditions is “Personal history of other diseases of the digestive system,” Z87.19.  While, SNOMED has over 300 variations of digestive conditions that can then be specified by severity, history, laterality and frequency.

De-code patient stories

Adding more codes to capture more detail seems like a good idea but the inevitable explosion of the number of possible codes to choose from makes the whole scheme harder to use. There are just too many possibilities within the unstructured clinical narrative to be accounted for and to translate into structured data. Or are there?

SNOMED CT is a standardized nomenclature of medicine and clinical terms (hence the name), which can be thought of as the language of medicine that is used within healthcare to structure clinical narrative. It is the most comprehensive and precise clinical terminology available in the world today. And while, even SNOMED cannot capture every single possibility, with nearly 1.5 million different relational combinations, it comes a lot closer than ICD-10-CM. There are 349,473 core concepts, that’s over five times as many ICD-10-CM codes for conditions, but much more important than the total number of codes is the fact that the concepts can be combined together to create structured information for ideas that are not actually listed in the terminology. SNOMED CT can actually represent billions of possible ideas precisely because it doesn’t have to list all the possible permutations like ICD-10-CM does.

Put it into context

Combining these concepts to create contextual meaning, is called post-coordination. By using post-coordinated SNOMED CT concepts, very specific and very complete depictions of patients’ stories can be created as structured data. Learning to use SNOMED CT in a fully post-coordinated way is undoubtedly harder than simply searching for individual concepts, which is why few health-care system providers have attempted to implement it. The good news, however, is that Clinithink have removed the barrier to effective, richly-detailed structured data generation by developing a powerful CNLP (Clinical Natural Language Processing) engine that takes unstructured clinical narrative and turns it into structured data utilizing post-coordinated SNOMED CT concepts.

Today, finding patients that match complex clinical trial eligibility criteria using structured data is rife with complications and inaccuracies. The ICD-10 code just can’t describe the patients’ conditions, diseases and current state well enough to positively identify them to match most of the more complex protocols which are becoming more prevalent.

But what if you could use post-coordinated SNOMED data instead to match patients? CLiX ENRICH for Clinical Trials does just this. By utilizing the unique CNLP engine, combined with an easy-to-use, built-in querying module, CLiX ENRICH makes it possible to define the specific criteria required for a clinical trial (or any other type of audit or analysis) and then search through the automatically-generated structured data from millions of clinical records in just a few hours and find precisely the right individuals for any given scenario.

About the Author

Dr Richard Gain, MD joined Clinithink in August 2011. He manages the team of clinical terminologists who work to further develop and maintain the products as well as providing operational and training capability. He brings a wealth of experience of clinical terminologies and coding schemes as well as the practical deployment of clinical information technology. Dr. Gain is also an active member of the SNOMED CT UK Edition committee.

A paradigm shift at Mount Sinai [Video]

Traditional methods of patient recruitment are no longer feasible as clinical trials grow ever more complex – spanning across multiple sites, searching wider patient populations and often falling short of target.

Researchers at Mount Sinai are making a paradigm shift in how they find and rank eligible patients by using CLiX ENRICH for Clinical Trials to analyze vast amounts of clinical data in the hundreds of thousands of patient records they hold. By replacing manual methods with automation, they’re setting a new standard in timely, cost effective patient recruitment.

In this video blog, we spoke to Dr. Steven Coca, Dr. Girish Nadkarni and Stephen Ellis to understand the difference that CLiX ENRICH for Clinical Trials is making.

Watch the video

4 Reasons to Automate Patient Recruitment

More drugs are being approved at a faster rate than ever before. Record numbers of new drugs were approved in 2014 and 2015, with marked increases in the approval of first cycle CDER applications. The increase in approvals can be attributed to both an increase in applications submitted for first in class drugs and orphan drugs, as well as the evolution in the way the FDA works with industry, and their offering four new paths for expedited development and/or review.

Automation in clinical trials is becoming more common place as CROs and research sites look for smarter strategies to keep up with the momentum building in the pharma industry to develop new drugs and take them to market. Patient recruitment is the most crucial function of any study but remains largely manual and paper-based, requiring vast amounts of clinician’s and researcher’s time that doesn’t necessarily result in identifying and enrolling suitable patients. The need to address this long standing stumbling block in clinical trials has led investigators and trial managers to uncover the main drivers behind automation: safety and financial.

Enrol patients safely, 10X faster

The fact is that recruitment can still come to a standstill even after applying numerous strategies including the manual review of patient data. Other recruitment strategies, such as outsourcing are not options in many instances because enrolling patients is more than fulfilling a trial requirement. Patient safety, confidentiality and consent are all central to a successful study and outsourcing can compromise any or all of these.

Sites have reams of patient data, both in structured and unstructured form, which can be used to not only find patients but find the right patients. Automating the analysis of this data means that sites can identify eligible patients within their patient populations up to 10X faster and most importantly within clinical governance guidelines and while maintaining patient confidentiality.

Meet study protocol

The clock is always ticking in clinical trials; whether it’s to find new patients, enrol eligible patients, enlist new sites or get new drugs to market. Recruitment timelines are often not met, in part due to the increasing complexity of inclusion and exclusion criteria; and also the difficulty in assessing all the available information that impacts the identification of suitable subjects.

Sites such as ours deem that we have the right type and number of patients to participate in a study but changes in entry criteria during the regulatory phases can result in criteria becoming more stringent, delaying the process and therefore, even tighter timelines. At this point, automation could play a pivotal role in ensuring a site can recruit to target and meet exacting deadlines.

When we had exhausted recruitment for a complex study we piloted CLiX ENRICH for Clinical Trials to analyse patient data, working with the Clinithink team to ensure that the search was optimised against inclusion and exclusion criteria. Within just weeks of incorporating CLiX ENRICH we identified suitable patients and were able to enrol additional subjects over and above our initial target.

Gain competitive advantage

It’s expensive and time consuming for pharma companies to enlist new clinics or GPs to expand the search for eligible patients. As a result, sponsors would rather work with fewer, quality sites with better patient numbers. As a site, there are obvious financial benefits to having a proven track record of fast and high-value recruitment metrics.

Sites are under more and more pressure to look for efficiencies and gain a competitive advantage as the trend of faster approval of new drugs continues to escalate. In fact, the ability to verify quickly and easily that you have the appropriate patient population for a study will become an expectation as automation grows in the industry.

Financial gains

All studies vary in complexity, timelines and fee structure and the financial benefits to automation are both direct and indirect. From the onset you are realising significant savings by reducing staff costs and time collecting, scanning and sifting through patient notes, freeing them up to return to clinical activities.

For sites such as ours, we generally need to enrol a minimum of four patients to recover the cost of attending sponsor meetings, the initiation period, screening visits, and so on.

For more complex studies that require drug infusions, numerous blood sampling timelines, and/or more than one randomisation within the project, it’s essential that we minimise the financial implications of treating the indication being studied, especially when treatment requires the use of expensive medicines or testing such as MRI scans. Therefore, optimising patient numbers is crucial, even if there is a cap on recruitment, to ensure that you are not only meeting trial timelines but operating profitably and maximising the opportunity for repeat business.

That said the cost and resources involved in developing an automation tool for patient recruitment is prohibitive for individual sites. The results from our pilot with CLiX ENRICH are impressive and we are continuing to work with them to evaluate results across a number of clinical studies where different inclusion and exclusions need to be applied.

Employing the service of an existing tool that has been proven to increase the pre-screening yield and rate of enrolment is a way forward for CROs and research sites to remain competitive and meaningfully contribute to the growing rate at which new drugs are being brought to market.

About the Author

In this post Kathie Wareham, Clinical Research Unit Director at the Joint Clinical Research Facility (JCRF) part of ABMU Health Board in collaboration with Swansea University, discusses why automation is valuable in expediting feasibility and patient identification in clinical trials.


Needles in a haystack? No problem.

Recruiting patients for clinical trials is notoriously a difficult and complex task. Typically, research sites rely on labor intensive, very inefficient and time consuming manual processes, including chart review and clinician memory to identify potential candidates. Even if you’re considering using a recruitment partner, they are still relying on conventional methods of advertising, manual vetting and social media mining to find the right candidates. But there’s still a good chance that even when all of these options have been exhausted, clinical trials will still be stalled by several months or even years as traditional recruitment methods are by nature, unproductive and flawed.

As a trial sponsor or Principal Investigator at a research site, you are all too familiar with the delays in recruitment and the negative impact on budget and timelines. But that opens up the question of where else to look to find the right trial candidates?

Pinpoint your pain points

For many site managers, health data is an available and necessary source of patient information, but it’s also well known that manual review of patient charts and records simply takes too long and is prone to error. Added to that, is the increasing complexity of inclusion and exclusion criteria that often can’t be met using the information in structured EHR data alone.

Clinical data is either structured or unstructured. Structured data mainly accounts for patient demographics, procedure and diagnostic codes, pharmacy dispensations and lab values. Unstructured data is everything else – the narrative written by clinicians about their patient encounters – and almost always includes the information contained within the structured data. For a Principal Investigator, the real reason to take a second look at your unstructured data is that approximately 60-70% of clinical trial eligibility criteria can be sought only from unstructured data.

Within a typical patient record, documents that contain valuable unstructured data are mainly:

  • Progress notes
  • Discharge summaries
  • Emergency Medicine notes
  • Outpatient Records
  • Radiology reports
  • Consultation Notes
  • Operation notes
  • Pathology reports
  • Medication lists

But accurately reviewing and processing all of these documents and sources is virtually impossible, right? It is if you’re doing it manually. But this is where Clinithink’s CLiX ENRICH for Clinical Trials is your key to unlocking information stored as unstructured data.

On pins and needles

Studies show that by applying technology that automates the review of unstructured data and returns information that can be used to make decisions at critical points throughout the patient recruitment stage can result in significant savings in time and money for both the site and sponsor.

Every provider and research site has unstructured data stored, but by including CLiX ENRICH for Clinical Trials as an essential component to carrying out a trial means that you can leverage all sources of data to find quality candidates that meet protocol criteria, even for rare disease trials. Typical data sources include:

  • Clinical Data Warehouses;
  • Enterprise EMR Systems;
  • Departmental EMR Systems;
  • Transcription Systems;
  • Document Management Systems; and
  • Local File Servers

It isn’t feasible (or even possible!) to enlist manual techniques to search for potential candidates using the data you already have, but that’s where CLiX ENRICH for Clinical Trials makes a remarkable difference to trial timelines. Firstly, it’s a machine. It doesn’t get tired like humans. This means that not only can you feed CLiX ENRICH tons of data to find the information you’re looking for; it also finds 10X more pre-screened candidates matching the clinical trial protocol in ¼ of the time. In fact, the more unstructured data you process using CLiX, the more you’ll know about your patient population against trial criteria to inform trial feasibility and site selection.

Secondly, you don’t compromise the safety or ownership of data by employing CLiX ENRICH. All data, structured and unstructured, remains securely within the control of the site. For many, the idea of processing patient records through a third party solution provider is halted by questions around data security, access and potential threats. However, with CLiX ENRICH for Clinical Trials, you don’t share your data with Clinithink, or any unauthorized individuals and entities.

What does all this mean for your clinical trials and launching of the investigational product? It means that whether you’re planning a clinical trial that requires the enrollment of rare, hard-to-find patients and/or enlisting multi-site participation for a more common condition, you can use the data already available to you to improve the accuracy of candidate vetting and speed up recruitment in a way that has never been seen before in the clinical trials industry. Using CLiX ENRICH for Clinical Trials makes it all the more possible to find the needles in the haystack quickly, easily, cost effectively and securely.

About the Author

Phil Davies is COO at Clinithink where he is responsible for the delivery of Clinithink technology. Phil has more than 15 years’ experience working in healthcare IT and uses that experience to support the transformation of clinical trials recruitment through CLiX.

Overcoming Recruitment Hurdles


Failure to recruit a sufficient number of suitable patients can delay a clinical trial by years, or even worse, result in failure to even get “out of the starting blocks”. Fortunately, automated on-site review of unstructured clinical narrative in a clinical data warehouse and/or EMR (Electronic Medical Record) using CLiX ENRICH for Clinical Trials can identify exponentially more eligible patients by overcoming the hurdles associated with traditional recruitment methods.

Cost of recruitment delays

Every month by which the drug development process can be shortened is worth $25 million in revenue for the average bio-pharmaceutical sponsor.  By unearthing potentially suitable patients, reducing screen failure rates, and increasing accrual rates, CLiX ENRICH for Clinical Trials can dramatically reduce the time and costs incurred when sites, Clinical Research Organizations (CROs) and sponsors fail to enroll sufficient patients.


More targeted than traditional recruitment

While traditional outreach activities – such as placing ads, posting on social media, working with advocacy groups, sending out newsletters – has its place, a paradigm shift is taking place in clinical trial recruitment.  Clinical trial sites, CROs and sponsors are turning to ‘data mining’ of unstructured clinical narratives to identify suitable patients. CLiX ENRICH for Clinical Trials quickly queries unstructured clinical text in order to pre-screen patients for trials.

Traditional outreach activities do bring in many ‘self-identified’ patients. However, during the screening process, many people will turn out not to meet complex protocol eligibility requirements necessary for enrollment. The numbers of eligible patients found with on-site pre-screening conducted via manual EMR review can be hopelessly low and eligibility poorly matched.

Reduce screen failures

CLiX ENRICH uses CNLP (Clinical Natural Language Processing) to automatically and efficiently pre-screen patients by querying unstructured data. Accurate and expedient, the technology-enabled solution removes the need for extensive manual chart review almost entirely, releasing valuable time traditionally spent by site staff on this laborious task.

CLiX ENRICH processes large volumes of unstructured physician narrative, progress and transcription notes and discharge summaries found in data warehouses and EMRs. The software then organizes all patients according to how closely the clinical documentation matches a trial’s inclusion/exclusion criteria. The result is an ENRICHed List, a prioritized ranking of highly eligible patients that allows investigators to be much more targeted in whom to approach for screening and consent.

ENRICHed List Prioritization Rankings


The CLiX ENRICH solution

The Icahn School of Medicine at Mount Sinai (MSSM) uses CLiX ENRICH to find appropriate patients for clinical trials. In one study, CLiX ENRICH processed and queried over 500,000 documents, producing a list of 97 highly eligible pre-screened patients that was then reviewed by MSSM investigators.

High quality pharmaceutical clinical research relies on painstakingly identifying the most qualified patients to study. Yet, clinical trial research sites and CROs experience delays in patient recruitment due to increasingly complex eligibility criteria set by sponsors. Low or slow accrual rates – the failure to enroll an adequate number of patients – are a common hurdle that can often be overcome with new technologies.

About the Author

Judith Teall is a well-respected clinical research industry speaker, creative thinker, strategist, and author with a global reputation in the field of clinical patient engagement, and more than 30 years of healthcare and clinical research industry experience. Most recently, Judith chaired Day One of the 2015 MCT (Mobile in Clinical Trials) Congress, and presented and chaired at DIA 2015.

Unlocking Buried Treasures with CNLP

Unlocking Hidden Treasures Header

Finding patients matching complex eligibility criteria for clinical trials requires review of unstructured narrative content in clinical notes, radiology/pathology reports, transcription files and discharge summaries. These narratives reveal treasures locked in unstructured data, helping organizations quickly identify patients for clinical trials.

Manual vs. machine reading

Manually reviewing vast numbers of clinical documents to find patients matching complex clinical trial protocols is a slow and cumbersome process. Manual review may miss highly eligible patients, leading to low accrual rates and jeopardizing the timely completion of a trial.

Technological software advances now allow computers to “machine read” unstructured data through a process called “clinical natural language processing”.  Acquiring the tools and building the infrastructure to “machine read” clinical documents is costly and resource intensive. Fortunately, organizations can license affordable, out-of-the-box ready solutions, like CLiX CNLP, that reads through clinical notes in a fraction of the time and more accurately then other inefficient methods using humans.

CLiX ENRICH, powered by proprietary CLiX CNLP, processes large volumes of unstructured data, identifying patients whose clinical documentation indicate they closely match a trial’s inclusion and exclusion criteria. The result is a prioritized shortlist of highly eligible patients called the ENRICHed List.

CNLP solutions

CNLP software solutions vary significantly in their functionality, ease of use and hardware requirements. CLiX CNLP, for example runs on existing system hardware, in contrast to cloud-based only CNLP solutions. CLiX CNLP also maps to SNOMED CT, a comprehensive medical terminology that accurately captures the nuances of physician language. After unstructured data is processed by CLiX CNLP, you can then query the data to find eligible clinical trial participants, a topic we will explore in a future blog.

Unlocking Hidden Treasures Infographic*Prospective Diabetes with Nephrology Complications Clinical Trial with Mount Sinai School of Medicine/Nephrology.

 Teaching computers to read

“Natural language processing” (NLP) enables computers to read free-form text by “teaching” the machine rules for understanding human language; including vocabulary, sentence structure [syntax], and word patterns. Designing CNLP algorithms that read and interpret physicians’ natural language input is especially challenging because the software must be taught to understand and interpret complex medical terms, phrases, abbreviations and concepts.  CLiX CNLP also takes into consideration common misspellings.  The process of taking thousands of pages of clinical documents and converting it to a machine-readable form has been honed over a number of years.

How CLiX CNLP works

  1. Preprocessing
    During this phase, CLiX CNLP “cleans up” inconsistencies in clinical notes. CLiX CNLP standardizes individual providers’ descriptions of patient encounters, as well as abbreviations and acronyms, fixes spelling errors, and corrects incomplete sentences. The clinical narrative is then organized into components such as clinical headings (e.g. chief complaint, family history) and assigned speech tags such as subject, object and verb.
  1. Encoding
    Clinical words and phrases within the unstructured narrative are converted to standardized medical terminology processable by a computer. CLiX CNLP maps all medical terms found in clinical documents to SNOMED CT, a medical vocabulary of clinical terminology which includes over one million medical concepts.
  1. Post-coordination
    After matching clinical terms in free-form narrative to SNOMED CT, CLiX CNLP reads surrounding sentences and paragraphs to extract additional contextual information relevant to identifying appropriate patients for clinical trials. Post-coordination, a unique feature of CLiX CNLP, adds meaning by combining various SNOMED CT terms to form more granular clinical concepts that precisely represent how physicians describe patient conditions in clinical notes.
Examples of physician narrative converted to post-coordinated SNOMED CT
Asthma & AllergyImpression: ? Asthma. Given the family history of asthma, I would like to arrange lung function tests to rule this out

  • Asthma, known possible
  • History of asthma in the family
  • Lung function tests requested

DermatologyPlan: Decrease prednisone to 5mg OD as I believe the facial rash to be steroid induced

  • Prednisone 5mg once daily
  • Rash on skin of face caused by steroids

OncologyPatient has lung cancer with no mets. Is coping well with ECOG 0

  • Patient has Lung cancer
  • No metastases
  • ECOG performance status 0

Infectious diseasePresented with dirty sputum. CXR showed widespread right pneumonia. Started on Penicillin

  • Dirty Sputum
  • Chest X-ray done
  • Pneumonia, Right side, widespread
  • Penicillin started

GIShe describes 2 weeks of severe abdominal pain and mild nausea

  • Abdominal pain, severe, 2 weeks
  • Nausea, mild, 2 weeks

Unstructured data captured in EHRs and data warehouses unlocks a treasure trove of information locked in clinical narratives. Machine reading unstructured documents with CLiX CNLP helps organizations quickly tease out those patients whose clinical documentation closely matches a trial’s eligibility criteria.

About the author
Carl Cresswell is the Chief Architect at Clinithink with over 19 years Healthcare IT experience designing, developing and supporting applications globally. He has almost eight years of experience with NLP technology, six of which spent with Clinithink since its inception.

Lost in Translation

Imagine if everything you observed in a patient encounter had to be compressed into 140 characters, the length of a tweet. This happens daily at hospitals and physicians’ offices around the world as coders translate physician-patient encounters into codes for billing, or what many refer to as “structured” data.  Since close to two-thirds of patients’ information is found in “unstructured” clinical narrative, important information is lost when physicians’ notes are translated into “structured” code[1].

Coders – whether human or automated programs – compress electronic information in clinical document systems into ICD (International Classification of Diseases) and CPT (Current Procedural Terminology) codes that are used for reimbursement purposes.   This “structured” data resides in EMRs, medical coding systems, and data warehouses. While structured data provides codes that categorize patients by diagnoses, lab values, and medications, this coding fails to capture the nuanced, subtle and rich language used when providers document patients’ stories during encounters. This unstructured narrative in clinical notes, radiology/pathology reports, transcription files and discharge summaries reflects a physician’s crucial impressions and interpretations of patients’ conditions, including symptoms, signs, prognoses, and responses to treatment. Most of this information cannot be expressed in structured coded data using ICD or CPT.

clinithink-blog1-header-website-finalUnlock the power of unstructured clinical data

Consider smoking status, an example of an extremely clinically relevant, but often ambiguous, attribute of health status.  While ICD and CPT codes exist for “Dependence, drug nicotine” and” Tobacco abuse counseling”, structured data does not reveal the complexities of smoking status recorded by physicians in clinical narrative. Important subtleties recorded during clinical encounters are lost when a physician’s notes are translated into structured codes.  These subtleties make a real difference for quality measures, value-based medicine and clinical trial protocols

Below are just a few of the ways in which clinicians describe a patient’s tobacco or smoking status.

  • “Advised her to stop smoking”
  • “Smoked a pack of cigarettes a day/smokes 1 pack per day/smokes 20 cigarettes per day”
  • “Smoked daily”
  •  “Would like to quit smoking”
  • “Former smoker”
  • “Never smoked”
  • “Smoker, current status unknown”

Quickly find patients meeting complex eligibility protocols

By recognizing and distinguishing subtle language variations in unstructured notes, Clinithink’s CLiX ENRICH dramatically increases the pre-screen yield during clinical trial patient recruitment in a fraction of the time it takes using current manual methods.

One advantage of leveraging clinical narrative using CLiX ENRICH is the ability to assess patients’ eligibility for clinical trials much more rapidly than manual review currently allows. CLiX ENRICH can “read” through hundreds of thousands of unstructured clinical documents in hours, flagging patients with disease, signs and symptoms closely matching a trial’s inclusion/exclusion criteria. Reproducing this through manual chart review is virtually impossible for pharmaceutical companies, CROs, and principal investigators with limited time and resources.  In a nephrology trial of patients with diabetes, principal investigators at the Icahn School of Medicine at Mount Sinai in New York used the CLiX ENRICH platform to search 535,000 unstructured, clinical free-form text documents in 15 hours, finding 10 times the number of eligible candidates in one-eighth the time compared to manual review.


Examples of physician narrative with no equivalent structured ICD code
Physician SpecialtyClinical Trial AreaNarrative Example
Rheumatology Systemic lupus erythematosus (SLE)Malar rash (Butterlfy) is a typical SLE symptom that cannot be recovered from structured data because there is no ICD code.
HepatologyLiver Disease/CirrhosisChild-Pugh score is used to assess prognosis of chronic liver disease and cirrhosis. However, there is no ICD code for Child Pugh classification for liver disease (class C).
CardiologyChronic Heart FailureChronic heart failure with reduced ejection fraction (HFrEF) Stage C New York Heart Association (NYHA class II – IV) has a class 1 recommendation for mineralocorticoid antagonists (MRAs). However, patients with NYHA classification would only be found in clinical narrative echocardiography reports.
CardiologyStrokeFinding clinical trial participants with no history of CVA at risk of stroke is not possible with structured data, as no ICD codes exist for such complex patients.
AllAllFamily History of
Ophthalmology problems
No indications of
No relapse in past six months
Difficulty Walking


Meaningful information, including patient signs and symptoms, is very often “lost in translation”.  The structured billing data is a poor substitute for the clinical stories physicians write about their patients. Today, coded billing data is often the only source used for cohort identification. This coded information lacks the nuanced clinical observations captured by physicians while documenting patient encounters, which are so important in truly understanding patient populations and outcomes.

  1. Womack JA, Scotch M, Leung SN, Skanderson M, Bathulapalli H, Haskell SG, Brandt CA. Use of Structured and Unstructured Data to Identify Contraceptive Use in Women Veterans. Perspectives in Health Information Management. AHIMA Foundation. 2013 Jul 1; 10:1e.

3 Key Areas to Improve Clinical Trials Using Patient Data and CLiX ENRICH for Clinical Trials

Unstructured narrative patient data, found in real world data sources such as progress notes and correspondence within electronic health records, contains a wealth of information that, if used correctly, can drive improvements in the clinical trial process. Currently, organizations must deploy expensive clinical personnel to manually review this existing rich and abundant resource for clinical trial recruitment

At the Mayo Clinic they are using IBM Watson’s natural language processing (NLP) and data analytics capabilities to sift through millions of pages of clinical trial and patient data for subject recruitment. Also, a study published by Cincinnati Children’s Hospital Medical Center assessing the effectiveness of NLP in clinical trials, reported that the workload was reduced by 92% with a 450% increase in subject screening efficiency.

Clinithink’s CLiX ENRICH for Clinical Trials provides value for subject recruitment and beyond. By accessing rich, unstructured patient data found within electronic medical records (EMRs), case report forms (CRFs), patient reported outcomes (PROs), clinical trial management systems (CTMS), electronic data capture (EDC) systems, trip reports and other clinical documentation, Clinithink’s solution enables a positive impact during several critical points throughout the clinical trials process.

CLiX ENRICH for clinical trials video

In this video I explain how, when used for Feasibility, Subject Recruitment and Pharmacovigilance, CLiX ENRICH for Clinical Trials provides a distinct data advantage to realize:

  • Savings of time and money
  • Risk evaluation and reduction
  • Increased predictability
  • Optimized views of subject trial data

Conferences, Clinical Data and Rubik’s Cubes…Only at EHI Live 2014

This year’s EHI Live, held at the NEC Birmingham, UK, offered a pleasant mix of things to do and see for the 3,800 visitors made up of NHS (National Health Service) providers, commissioning organizations and exhibitors. The show, which saw a 46% rise in NHS attendees, gave delegates a taste of the future direction of healthcare IT with keynote presentations from Tim Kelsey, NHS England’s director of patients and information, and Health and Social Care Information Centre chief executive Andy Williams. Read more

Clinithink to Present at Upcoming IHTSDO SNOMED CT Implementation Showcase 2014

This will be an exciting year to catch up with Clinithink at the upcoming IHTSDO SNOMED CT Implementation Showcase 2014 because this year the abstract we submitted entitled ‘Clinical Natural Language Processing Tools for SNOMED CT’ was accepted for presentation. Please plan to join us on Friday October 31― the time and location of this presentation are listed here on our website. Read more

Top 5 Considerations When Choosing a CNLP Partner

The vast majority of clinical data, roughly 80%, is unstructured. And, the amount of data is expanding rapidly. In 2013, growth in healthcare data stood at 500 petabytes and will grow to 25,000 petabytes in 2020. Deriving insights from ALL clinical data, including this unstructured content, is essential to survive in this era of rapidly evolving healthcare.

Clinical natural language processing (CNLP) harnesses this rich vein of healthcare data. Yet it can be confusing to identify the right technology solution. Natural language processing (NLP) has been around for decades, but how do you find a solution built to solve real world problems in a clinical environment? Read more

Deja Vu: Accountable Providers Lack the Insight Needed to Judge Their Own Risk and Reward

I read an article the other day that has me reminiscing. The article compared the experience of two medical groups adjusting to the accountable care organization (ACO) model. One group will receive nearly a $1M bonus for achieving a variety of quality outcome metrics while the other remains uncertain how they will be affected – good or bad. Executive vice president of the latter organization, Tony Slonim, M.D. lamented, “It sounds ridiculous, but we have no sense of how much we might get. We know our average spend and we know our performance, but with case mix adjustment, it remains theoretical.” Dr. Slonim and colleagues are not alone. Read more

An ACO Imperative: Leveraging CNLP to Control Patient Leakage

The emerging Accountable Care Organization (ACO) model serves three objectives: improve the patient experience, manage the health of patient populations, and reduce the per capita cost of healthcare. To this end, ACO providers use evidence-based guidelines and tightly managed care coordination to ensure that patients receive appropriate care while avoiding unnecessary or ineffective treatments and their associated costs. The ACO’s financial future hinges on their ability to identify patient populations by current conditions, predict future patient needs based on medical histories and keep patients from straying outside the organization for care. Read more

Semantics, Semantics. Unlocking the meaning of unstructured data for healthcare analytics

In the era of accountable care, the healthcare information technology (HIT) needs of the modern health system are vast. Accountable Care Organizations (ACOs) are groups of doctors, hospitals, and other health care providers, who coordinate high quality care for their patients to ensure they get the right care at the right time, while avoiding unnecessary duplication of services and preventing medical errors. To do this, providers must be able to match each patient with evidence-based care guidelines and track his or her treatment across the continuum of care goals that require significant HIT investment. Read more

Beyond Speech Recognition: The Benefits of Moving to Clinical NLP

In the same way that voice recognition technology has strengthened to the point where its use is becoming ubiquitous, Natural Language Processing (NLP), and especially Clinical NLP (CNLP), is emerging as an extraordinarily powerful tool for harvesting knowledge from unstructured narrative data.

Though sometimes confused, speech recognition and CNLP are vastly different technologies. In a sense, speech recognition is like a pen, capturing thoughts to paper. NLP allows those thoughts to be freed from the paper by giving the computer the ability to extract meaning from the text. My colleague Dr. Tielman Van Vleck, has written about this, including in his Clinical NLP in Plain English blog series. Read more

How Clinithink is Evolving CLiX to Meet Market Needs

Clinithink always enjoys talking with our partners because together we believe we are making major contributions to improving healthcare.

That’s a heady thought, and with the confluence of big data and the EHR/EMR that’s starting to happen there should be a spectrum of advances to help clinicians and researchers identify new outcomes-based best practices and insights that can drive the practice of medicine forward. Read more

Capturing Change: The Big Ways Big Data Could Add Value to Healthcare

The article was first published in Hit Consultant on 05/29/2013

Clinithink’s Russ Anderson and Dr. Tielman Van Vleck help us break down big data’s potential by exploring the progress and promise unfolding from it.

Surely, you’ve heard all about big data and its promise to improve healthcare by now. If you haven’t, here’s the scoop: big data could save the U.S. healthcare system more than $3 billion annually, according to a McKinsey report. Yes, it seems big data is unstoppable and its potential undeniable. What remains unclear, however, is how big data’s potential will break down to truly add value to healthcare. Read more

5 Key Elements of Successful Clinical Documentation Solutions

The article was first published in Hit Consultant on 02/11/2013

Clinithink’s Chris Tackaberry and Peter Johnson explains five key elements of successful clinical documentation solutions to unlock unstructured clinical data.

There is no question that clinical documentation is becoming increasingly complex, as new quality measures and coding practices are introduced into healthcare. It’s ironic, in fact, that such reform efforts designed to bring clarity to healthcare services have proven to be nothing short of cumbersome and confusing. Read more

Interview: How Chris Tackaberry is changing the face of healthcare with Clinithink

Chris Tackaberry co-founder and CEO of Clinithink, speaks to Your Hidden Potential about how Clinithink is transforming healthcare. The article was first published on 01/31/2013.

Chris Tackaberry is the co-founder and CEO of Clinithink, a UK-based healthcare software company. In our interview, Chris talks me through his journey as an entrepreneur, explaining how the idea for Clinithink came about and the challenges faced along the way. In addition, he also shares his knowledge on fundraising and what type of mind-set entrepreneurs should have when going into it, concluding with some great advice for first time entrepreneurs. Read more

NLP and Physician Workflow: An End to Physician Resistance?

The article was first published in HIStalk, Readers Write 01/09/2013

“I hate all the EMRs out there, including the one our practice just bought. Notes that come from an EMR have so much extra stuffing in them that it takes me forever to figure out what you guys really had to say about the patient I referred to you. I have to wade through lines and lines of empty verbiage to finally find a meaningful sentence or two that tells me what I need to know.” Read more

Paying Attention to How NLP Can Impact Healthcare

This post first appeared in HIStalk, Readers Write 11/19/2012

Unstructured clinical narrative is increasingly being seen as the primary source of sharable, reusable, and continually accessible knowledge, essential in helping providers make informed decisions, reduce costs, and ultimately improve patient care. While form-driven EHRs readily leverage and share captured structured data, the richest patient information remains locked inside EHR databases as unstructured notes. Read more

Big Data, Big Impact: And a Very Big Year in Healthcare

In recent years, there has been a vast proliferation of medical data being captured electronically, opening a world of big-data analytics never before possible. In 2012, healthcare saw these huge strides continue, with more and more providers recognizing the value of big data and seeking to understand and utilize the tools necessary to capitalize on it. The big data in question is coming not only from increased use of EHRs and increasingly economical genetic sequencing, but also from an assortment of technologies and instruments ranging from wearable devices to smartphone apps to natural language processing technologies that identify data from existing clinical notes. Read more

Puzzles and problem-solving

Mechanical puzzles have been around for hundreds of years but are currently enjoying a resurgence in popularity. They can be both mentally challenging and aesthetically pleasing. If you have been fortunate enough to receive one of the Clinithink puzzle cubes, I hope you are enjoying the challenge. Read more