Hey! Just a heads up I've moved to pcward.github.io!

Looking at Data: Hospital Costs

I’m trying out a new little side project, taking a leaf out of the playbook of Romy Misra, who writes about and analyzes (intermittently) a public data set (here: http://romymisra.com/category/looking-at-data/).

I’ll mostly be doing these analyses in Tableau, but there’s nothing preventing me from doing it in any number of free tools such as IBM Many Eyes, Plotly, or R. In the future I might do just that, but for now I’m sticking with Tableau.

Anyway, this week I’ll be walking you through an analysis I did on hospital cost data.

This data relates costs of various common ailments to Medicare payments in the Inpatient Prospective Payment System (IPPS) system Medicare uses for billing purposes. There’s no date stamp on the data (boo!), so I will assume it’s the most recent publicly available data as of writing this (early February 2014). 

Here’s a link to the data: https://data.cms.gov/Medicare/Inpatient-Prospective-Payment-System-IPPS-Provider/97k6-zzx3

There are 161k records in this data, spanning the 100 most common medical diagnosis groups across all the major Medicare-accepting providers in the US.

Questions:

  1. Which states have the highest IPPS costs per provider?
  2. Which states have the highest IPPS costs per patient discharge (we’ll use patient discharge as a proxy for “unit” of medical care given)
  3. Which diagnoses cost the IPPS system the most?
  4. What are the most common medical diagnosis types, across the US and broken down by state?
  5. Does competition increase or decrease IPPS costs for Medicaid?
  6. Does an increase in diversity of what a provider treats increase or decrease costs?

I will address the first three questions first: these questions largely give context for medical payments, which is super-helpful when you’re trying to address any question.

Looking at the data, we see that Texas and California have the largest numbers of providers participating in the IPPS by far, and that there’s generally a higher concentration of providers east of the Mississippi river.

Addressing questions 1 and 2, a couple of states really stand out for having high IPPS costs, and they’re largely East coasters. Maryland takes the prize as having the highest costs per provider of any state, coming it at almost $1 million per provider. Washington DC and New Jersey round out the top-3.

When we look at costs by provider by hospital discharge, a similar pattern emerges: This time DC takes the cake, with Maryland a second, and New Jersey a third.

However, if we look at costs just by hospital discharge, a completely different set of states emerges as high-cost: California’s cost per discharge is almost twice the runner-up state (Texas)! New York and Florida round out the top-4. Beyond these four states, the costs drop pretty quickly.

So far we know that California’s costs per discharge are through the roof, but why? Let’s tackle question 3 next: what diagnosis groups are the most expensive for Medicare?

Turns out those are Major Small and Large Bowel procedures, Respiratory conditions that require ventilators, and Septicemia or Severe Sepsis. Interestingly, psychoses are the “cheapest.”

If we look at what was treated by state, you’ll see that California treated a disproportionately large number of these expensive conditions. Aha! I know we’re just drawing correlations here, but at least we generating some hypotheses for further investigation. That’s one of the major reasons for doing this kind of exploratory data analysis!

Now, what are the overall most treated illnesses? No surprises here: pneumonia, heart failure and shock, COPD, and cardiac arrhythmias. These are also largely the most common diagnoses when we break the data out by state (the relative ranking changes from state to state, but the general pattern is still true).

Finally, let’s address the effect of competition and breadth of treatment offered by provider on costs.

Does competition, which for this analysis I define as the number of providers in a given hospital referral region, increase or decrease costs?

If we look at a scatterplot of the average payment per hospital discharge against the number of providers in a hospital referral region, we’ll see that there’s a roughly linear relationship between costs and competition. The funny part is, it appears that as the number of providers in a region increases, the costs per discharge actually INCREASE! Again, this is just a correlation, not causation, but looks like we found another interesting hypothesis to test.

There’s one outlier that really stands out in this scatterplot, and that is Los Angeles, which seems to have really high costs per discharge. Something to keep in mind should we dig more into this question. Also, note that variance is not uniform (fancy term: heteroscedastic, in this case variance is proportional to the two variables in the scatterplot).

Let’s try to understand why LA has such high costs per hospital discharge.

If we create a scatterplot of average payment per hospital discharge against the number of treated diagnosis groups in a given ZIP code, we’ll notice that there’s generally a positive relationship between these two variables. Again, the data shows quite a bit of heteroscedasticity.

Note the really interesting pattern in the second graph: this is the exact same scatterplot as before, but I highlighted all the providers in LA in red. You’ll quickly see that providers in LA generally charge rates that are higher (to much higher) than the trend line.

Looking at the final heat map it’s pretty obvious that providers in LA tend to treat very expensive conditions.

Hmm!

So what have we learned?

  1. The east coast tends to have pretty high overall costs
  2. California has the highest costs per provider per hospital discharge
  3. Competition in a local market seems to increase costs (weird!)
  4. A higher diversity in what’s being diagnosed, by ZIP, increases costs (not really weird: they’re all specialists, and specialists generally cost more)
  5. One reason LA stands out as having exceptionally high costs per provider per discharge is that LA has a high degree of diversity in treatment (high degree of specialization) combined with a large number of highly expensive procedures being performed


That was fun! Next week I’ll be back with some exploratory analysis of the 2013 Tour de France (cycling! A topic close to my heart)!