Transcript: Dennis Shung, MD, on Machine-Learning Risk Score for Upper GI Bleeding

Hello, my name is Dennis Shung from the Yale School of Medicine. Today, I want to talk a little bit about a new machine-learning score used to stratify risks for upper gastrointestinal bleeding that myself and my colleagues have recently published in Gastroenterology.

Currently, there are a number of available scores that can be used to restratify for upper gastrointestinal bleeding. The most validated and useful score, currently, is the Glasgow‑Blatchford Score. This has been recommended in the most recent international guidelines for upper gastrointestinal bleeding.

However, it has been shown that they have only very specific thresholds that are very helpful for practitioners in the community, which is very low‑risk thresholds. This is usually the Glasgow‑Blatchford Score of zero or one.

These scores, while they do perform well in that they will have near 100% sensitivity for making sure that patients do not go home and have a need for intervention or need to be in the hospital, they do have limitations in that because they're so stringent, they actually only capture a small proportion of patients who are truly low risk.

However, for these scores, another challenge is usability because these, especially the Glasgow‑Blatchford Score, require you to remember a lot of the cutoffs and a lot of the variables just off the top of your head. Even if you use an online calculator, you are stuck manually inputting these numbers that you see as you take care of patients and then obtaining a score. It takes quite a while for you to be able to manually input the numbers, get a result, and then remember how to interpret the result. The availability of these scores and the ability to use them at the point of care is limited.

The most recent data we have to suggest any amount of uptake is from around 2014 by Dr Sulzman and colleagues when they did a survey of practitioners in the community, gastroenterologists, internists, and emergency physicians. Only 30% of those surveyed actually have used an upper GI bleeding score.

There's definitely a lot of challenges in implementing these scores. As I mentioned before, these scores only capture a very small percentage of patients who are truly low risk. The ideal for any physician taking care of patients is to give them the right care at the right time. Right now, the problem with gastrointestinal bleeding is that people who are nongastroenterologists have limited experience, and they tend to admit patients for observation or for in‑patient stay when they really are low risk and should be discharged and sent home.

From a physician's point of view, I wouldn't want my patient to receive unnecessary care in the hospital when they could be just as happy back home. From a system's point of view, having patients who are low risk taking up resources that could be used for high‑risk patients doesn't make sense and leaves to a lot of unnecessary utilization and resource waste. From a gastroenterologist's point of view, we always want and prefer our patients to be seen in the outpatient setting, or we can give them safe anesthesia, and make sure we can take a look at what's going on with them and help them navigate that. But it's not always the best thing for patients to be in the hospital for unnecessary evaluation.

From reimbursement standpoints, gastroenterologists are more and more being involved in this idea of value‑based care. If we perform endoscopies on patients who are very low risk, number one, it unnecessarily exposes a lot of patients to anesthesia that they may not need. Number two, they don't really add to any visible outcomes that would lead to improving in‑patient health.

From a gastrointestinal bleeding perspective, we really want to be able to provide the best care at the right time to the right patient. The key motivating factor behind these scores is that if we find that [patients are] very low risk, the right place for them is at home, and they're coming to see us in outpatient clinic. If they're sicker, they're actively bleeding, and they could receive one of the interventions that can only be given in the hospital, then by all means, they should be in the hospital and also undergo urgent endoscopic evaluation intervention.

In order to set up the study, we took 4 centers in the US and Europe with patients who were prospectively entered into a registry for upper GI bleeding. These patients all had risk scores calculated.

The common pre‑endoscopic risk scores included Glasgow‑Blatchford, pre‑endoscopic Rockall, and AIMS65, and also had the outcomes that we're interested in, which is mortality, hemostatic intervention, or blood transfusion.

The way that we thought about this was that if we want to compare apples with apples, we really want to compare machine learning models developed from data from US and European sites to the risk models. The Glasgow‑Blatchford, for example, is developed in Scotland. Western populations, we wanted to make sure that we had a good comparison.

The next step we did was externally validate these machine-learning models that are trained in these 4 US‑European centers on 2 centers in Asia Pacific—one in New Zealand and the other one in Singapore. Presumably, both the risk scores and our machine-learning models would be seeing new data, because they were obviously developed on the Western populations and Asian Pacific populations.

We initially did a methodological study where we looked at a variety of machine-learning models. When we think about machine learning, it's not just one thing. It really encapsulates a variety of mathematical models that are used to find patterns in data.

We looked over 5 different classes of machine learning model and created 5 different models for all of the outcome measures that I described. The outcome measure that was the most important that we ended publishing on was a composite measure that took all 3—three‑day mortality; hemostatic intervention, either endoscopic surgical or interventional radiological; and also blood transfusion.

This composite outcome really gives the best idea for which patients need to be in the hospital and which patients could potentially be managed as an outpatient.

When we compared the performance of the machine-learning models to the clinical risk scores, we're able to see that some models performed better than others. Subsequently, we developed the one model that really beat all the other clinical risk scores, which is a gradient‑boosted decision-tree model. We showed that it performed better overall than the Glasgow‑Blatchford Score for predicting the composite outcome for need for hospital stay. It also performed significantly better in identifying low‑risk patients.

For example, previous guidelines have suggested the Glasgow‑Blatchford Score is zero to the very, very low‑risk patients who can be discharged. This is actually standard practice in several hospitals in Europe and the UK. If you have a Glasgow‑Blatchford Score of zero, then you're sent home and then followed up with an outpatient endoscopy.

When we compared our machine-learning model at the cutoff that was similar to the GBS of zero, we were able to identify two-and-a-half times more low‑risk patients than the Glasgow‑Blatchford Score of zero. This is preliminary data that suggest that if applied on a large population of all comers coming with overt signs of upper GI bleeding, we could potentially send home two-and-a-half times more patients instead of having them potentially stay in the hospital for 24 or 48 hours for observation.

In order to make this not just a theoretical benefit, we actually have been able to host the machine-learning algorithm on an app that's online for anyone to access across the world to try your hand at looking at it and seeing if it does indeed follow your intuition.

We also have a survey that's part of it, but the main thing is we want to make sure that people know that this is something that you can try out. Of course, it's experimental and obviously not recommended or validated in any way in any prospect of trial, but at least you can put your hands on something that has been validated in a prospective registry dataset.

The implications of the study are 2‑fold. The first is a proof of concept that these machine-learning tools can indeed perform better than what we've been using in the past. This means that, as we move forward, these tools will become more and more prevalent, especially when we're trying to assess risk in anything that has clinical data for gastroenterologist.

The second implication is that now with electronic health records, a lot of the data that before we had to write down or had to record somewhere manually is being automatically generated and populated.

Because of this, there's a very exciting prospect of integrating these machine-learning models within the electronic health record, which means that when you have a patient who's being evaluated in the emergency room, all of these fields will automatically populate, calculate the score, and provide the score to the provider who's seeing them in the emergency department.

As gastroenterologists, we may not even hear about low‑risk patients until the next morning when they say, "Hey, this patient came in. They're very low risk, and we want to make sure they follow up with you as an outpatient." This cuts down on unnecessary calls in the middle of the night for patients who are truly low risk.

It helps providers who don't have much familiarity with GI bleeding really be able to take care of patients in a much more higher level to be able to say, "OK, based on this algorithm and based on these factors, these patients can go home. I'll make sure they follow up as an outpatient with their GI provider."

One of the exciting new directions that I'm working on is not even looking at these registry data painstakingly extracted by nurses, medical students, and even physicians. Really using electronic health records, we can just have structured data fields that are entered in routinely and use those to construct machine-learning models that will hopefully perform better than any of the clinical risk scores that currently exist.

By doing that, you can make tools that are not just one tool to rule them all, one machine-learning model to be used everywhere, but customize local machine-learning models that can take into account your patient population, the epidemiology of your specific center, and help predict people who come into your emergency room and not just people from across the world.

That really is the future, personalizing the risk assessment for patients in different areas, because there's just so many things that could change that could make a general machine-learning model fail. It's really important to think about how we can provide the best care to our patients.

That is taking all the resource that we have and knowledge that we have about the pathophysiology of GI bleeding, the epidemiology, and the clinical characteristics that are high risk—boiling that down and customizing that for each specific center with their own electronic health records so that it can be automatic, customized to that center, and those patients and as accurate as humanly possible.

Several key takeaways, messages to think about are...

Number one, the increasing role of this new computational tool or machine-learning algorithms in our clinical care. I'm saying that these tools are very powerful, and they are here to stay. They will be more and more part of our modern practice of medicine.

Number two, understanding how to critically think about these tools, because the reason our study was published was because we had very rigorous methodology. We made sure that we had external validation. We thoughtfully constructed the models and compared different models and made sure that we were doing everything as rigorously as possible.

That's not the case for a lot of the machine-learning models that are being published here and there. It's important for practitioners to start having awareness of what makes a rigorously tested and validated machine-learning model that they can trust.

The third thing is that the future is not just in these complex models that will require more and more data, but it's a partnership between humans and computers. We can't keep track of the explosion of data that's out there. These are tools that can help boil down and condense all of these different sources of data into something that we can use at the bedside.

These tools, again, are just tools. They're not meant to suppress physicians. They're not meant to replace anybody. They're meant to help physicians make better decisions for their patients. That's what I hope that this paper will give you a glimpse into as the future of using tools to provide better patient care.

Thank you for listening today. I hope this is a helpful introduction to machine learning. Have a good day.