12/5/2023
Estimating Cost Savings in Employer-Sponsored Health Plans
Nandan Rao
In this post, I will do my best to explain some statistical problems that often plague our industry. What industry is that? Anyone who is offering, or evaluating, health and wellness services offered to payers with relatively small populations (thousands or tens of thousands), such as self-insured employers.
What are we trying to measure when we consider the value of a health and wellness service? What we want is the “impact,” which we can define as the causal effect of the service on the population. This is defined as the difference between two scenarios:
- The scenario in which the population is offered the service (treatment).
- The scenario in which the population is not offered the service (control).
Note that both scenarios must be exactly the same and differ only in the service being offered, in order to measure the causal impact. This is the very definition of causality offered by John Stuart Mill!
The fact that only one of these scenarios can be experienced by a population at any given point in time, and therefore you cannot measure both and compute their difference, is known as the “fundamental problem of causal inference” (Paul Holland summarizing work with Donald Rubin).
Approximating this measurement as best we can given the natural limitations of our world (time flows one way) is the science of causal inference. One of the key tools in this science is the randomized control trial (RCT) which randomly splits a group into two parts under the idea that then they will approximate each other and differences can be measured.
RCT’s solve, as best as possible, what we can call the identification problem. Which is to say: given a difference between two groups, how can we identify what caused the difference? In a randomized control trial, you can be confident about identifying the difference between the two groups as coming from the treatment. However, RCTs introduce a new problem: finite sample statistics. In particular, your two populations are similar, but not literally the same. We’ll call this problem the noise problem, as randomness is called “noise” in statistics and the two groups will differ in small, “random” ways.
Imagine you have a classroom of 30 children. Now randomly split the classroom in half, picking kids at random, so you have groups of 15 kids in each half. Now give them a test. Which half will perform better?
Hopefully you sense that, beforehand, it’s impossible to know: each side is equally likely to perform better or worse on any test. However, after the fact, there will usually be a difference. There might be one kid in the class that always scores 100% on every test. She will be on one side or the other, changing the average of each side. Because there are only 15 kids, the average is easily moved by this one kid.
Now imagine there are a million kids and you split them in half and give them the same test. Hopefully it’s easy to imagine that the average in both halves will probably be very very close to identical. One smart kid on either side can’t make much of a difference, there will be thousands of smart kids on both sides.
Now imagine you are back to having 30 kids, but all the kids get a perfect score on this test, all of them get 100%, always. Then there will be no noise problem, the two sides will have an equal score, always.
The noise problem is a function of two things: the variability in our underlying data (do scores vary between kids? How much?) and the size of our data (how many kids in our populations?)
This noise problem exists independently of the identification problem, it always exists in the real world, with finite data. In measuring the impact of a health and wellness service on a population, especially a small population, we have to deal with both the identification problem and the noise problem.
What does this all have to do with population health and insurance payers?
Well, we are in the business of improving health. One of the outcomes of improved health is lower cost to the insurance payer. That might be the primary interest, or it might be a secondary interest, depending on the payer, but it is often of interest.
We cannot run randomized control trials with our clients. So we are stuck with the identification problem. Additionally, our clients have relatively small populations (thousands), so we also have the noise problem.
Let’s consider two scenarios: one in which we analyze health (lab results), the other in which we analyze costs (total claims cost from claims data).
One of the simplest ways to analyze the data is a pre-post analysis. We can look at costs (health) in the population before offering the service compared to costs (health) in the population after offering the service. Say we implement a program in 2021 and want to measure its impact in 2023. What’s the identification problem with a pre-post analysis?
It has to do with time. from 2021-2023, the population could have changed, even without your intervention. Now let’s consider how time acts as a “confounder” (an additional thing changing, in addition to the wellness service being offered).
In the health dimension, do we expect your population to be healthier in 2021 vs. 2023? Did anything change? It could be that there are trends in the population and everyone is getting healthier, or less healthy, during that time. But it’s hard to imagine time having a huge impact on health (unless there is a global pandemic and everyone is locked inside, for example).
Now consider the cost dimension, do we expect cost changes from 2021 to 2023? Well, there are a lot of things that probably changed. Each year your health plan covers different products and services, the cost billed by doctors changes, the doctors covered change, the doctors and systems themselves change in their practices. Usually, this causes costs to go up. But by how much in two years? It depends on the two-year period!
Hopefully you see that with the identification problem, health outcomes are less impacted than the time confounder, compared to costs.
Now let’s consider the noise problem: how are these outcomes, costs vs. health, impacted by the noise problem?
The population size is the same, so the question is variability. Consider someone with diabetes, with poor blood sugar control. How variable is their HbA1c score? And how variable are their costs?
If you take an HbA1c test multiple times, you will get different results. A 0.2-0.3% change from test to test would be expected. This is variable, but it’s relatively contained. In particular, you will never have a huge difference: someone with poor blood sugar control will never get an HbA1c score of 6% or 20%, when their true score was 12%, for example. So if you take 20 people with a true score of 12%, their scores in practice will vary from 11.5%-12.5%. If you replay the scenario a hundred times, the scores will stay within that range.
What about the costs for a person with poor blood sugar control (12%) in any given year? There is a low probability of a bad event, needing an ER visit, caused by diabetes. But it’s not zero, and it’s very much elevated compared to someone with good blood sugar control. With costs, if you take 20 people in any given year, you could easily have none of them go to the ER with complications related to diabetes. But if you replay that scenario a hundred times, you will get a different outcome: some years someone will have emergencies. Some years, you will have two people with emergencies. Your ER costs will range from $0 to $10,000 to even $50,000-$200,000 in some years for those 20 people!
So what can we do?
At Kannact, this is how we think of health and their related claim costs:
Firstly, health is our outcome of interest. It’s the only thing our intervention directly impacts, we change health and we measure it in the most robust way possible: lab results.
Secondly, we know that better health leads to lower costs. On average. However, if you take any particular year, for any small/finite population, the variability in costs is so high that a healthier population can cost more than an unhealthy population in a given year!
Because of this, we measure and report health outcomes. However, because we know our clients care about costs, we convert the impact on health outcomes to cost savings numbers. How do we get those? We look at published medical studies, who analyze huge datasets over large periods of time, and we use their estimates: how much more will someone with an HbA1c score above 8% cost compared to someone with an HbA1c score below 8%?
Through this method, we both tackle the identification problem and the noise problem, while quantifying impact in numbers that allow decision making: dollars.
Want to see what that looks like in practice? Download our cost-savings white paper to see the impact that we have on costs, translated from reliable, low-noise health data!