Design a Medicare Data Power Tool

Fred Trotter

November 2, 2016

Step 1 of 6: Get a VRDC Seat

As the first Medicare data vendor, the ultimate source of CareSet’s Medicare claims data is the CMS Virtual Research Data Center (VRDC) program. VRDC, and its more complex cousin, the Qualified Entity (QE) program, are the official ways that the private industry may now commercialize Medicare claims data.   If you are interested in obtaining a VRDC seat of your own, we’d like to make the case for you to work with CareSet instead. A VRDC seat is a tool. We have made into a power tool. This blog outlines what we believe is essential in making this happen, including CareSet’s experience in each category. Not in chronological order- #1 is of course is getting the seat, which takes a few months.

2. Handle the technical and liability challenges

  CMS expects Medicare data vendors to protect patient privacy in the information they download from VRDC. First, they ensure that VRDC downloads comply with CMS policy. Second, they place substantial obligations to protect privacy on both the company that holds a VRDC seat and the individuals who actually sit at the computer terminals.   Essentially, VRDC seat holders and CMS are both 100% responsible for protecting patient privacy.   We take this responsibility seriously. We meet and exceed the privacy and cybersecurity requirements that are considered under the CMS VRDC program, and more generally under HIPAA. We also research state-of-the-art data aggregation methods to ensure patients cannot be reidentified.   CMS is doing all of this work parallel with us. However, we like to pretend that CMS is not participating in these efforts, which motivates us to ensure our data is compliant with policy and also damned difficult to re-identify.

3. Be better than Public Use Files

  If we only ran basic queries in VRDC, we’d just be competing with the Office of Enterprise Data and Analytics and their continued release of Public Use Files (PUFs).    However, every PUF file that CMS or HHS releases has the capacity to become a “Super PUF” if it is normalized and integrated with historical versions and sister datasets. For instance, we have found great benefit in tracking the changes to the NPPES file release since its first release in 2007.    A huge amount of our technical infrastructure is devoted to the ongoing processing and merger of hundreds of different public use files- many of which are updated daily. Watching the changes in these very basic data files ensures that we understand where the healthcare system has been and where it is going.  

I wish that this processing could be completed entirely by throwing the right engineering buzzwords at the problem. Indeed, our data processing stack is comparable to any that you might find in a data science company. But no amount of Hadoop/Spark/Node/Git/Whatever is going to fix data that was entered wrong, right. Beyond what typical ETL pipelines do, our data processing is more than half scrutiny, suffering and surveillance. We have to ensure that the data makes sense in the end.    A perfect example is the considerable work we do to reconcile street addresses across various datasets. When government data is so messed up that there is an Office of Inspector General Report about the problem, you can bet that it will be an ongoing headache for anyone who tries to rely on it.

4. Create a map of the healthcare system

  Once we put in the work to make our Super PUFs, the reward is a map of how healthcare is delivered in the United States. This map enhances our VRDC queries, and the resulting VRDC datasets enhance the map. It’s a beautiful symbiotic relationship.   By uploading our data into VRDC, we enable our VRDC queries to take the map into context. Again, street addresses are the perfect example. It is helpful to examine which services took place in the same location, but that process cannot begin without spending several years ensuring that address data is reliable.    Once you pull down aggregated claims insights from VRDC, we merge them with our map of the healthcare system in order to contextualize the new data.

5. Develop clinical ontologies

  The machine learning capacity inside the VRDC system is limited. It is valuable to conduct machine learning before and after data goes into and out of the VRDC system.    Most of those involve leveraging classical medical ontologies (think SNOMED CT and UMLS). But frequently, we use ontologies that consider the core constraint of VRDC - that small patient cohorts can’t be downloaded. This is deeply felt in the shift from ICD 9 to ICD 10 for claims justification. While ICD 10 brings more specificity it also makes claims aggregation more complex. We have recently seen the first quarter of ICD 10 data appear in VRDC, which means that our future analysis will all be broken into two “epochs” of data analysis, with two different generations of diagnosis systems.

For a brief nonsensical introduction to the implications of ICD 10, I encourage you to read Struck by Orca.    Sometimes the most striking result of these approaches is the combination of provider-level data, and clinical data that does not belong to any clinical ontology category. For instance, we can use the Meaningful Use PUF files to calculate which providers leverage EHR systems coded in MUMPS. (e.g. Epic and MEDITECH). Does being powered by MUMPS change the way doctors diagnose or prescribe? I doubt it. But those are the types of questions that we can answer by extending PUFs with new levels of semantic meaning, pumping the data back into VRDC, and then downloading the high level results back.

6. Invest in Data Journalism

  CareSet Systems is the sister company of DocGraph, a data journalism organization which opens healthcare data to the public. It is DocGraph that actually holds the VRDC seat, and CareSet has exclusive rights to market the data obtained by DocGraph.    DocGraph also uses VRDC access to create new open datasets. DocGraph created the first Medicare data PUF from a private organization by releasing the “MrPUP” dataset, which details how Medicare providers refer procedures. Currently DocGraph is working with the Cancer Moonshot program to release PUFs that detail how Medicare cancer patients move through the Medicare System.  

 Aside from VRDC, DocGraph obtains data by making Freedom of Information Act (FOIA) requests to departments of HHS including CMS, FDA and NLM.    A “DocGraph FOIA” is a great success when the request is the first of its kind, and the when a federal website provides the resulting data on their website. This means enough people replicated our original request that the government was required to post the information publicly. This happened most famously with the DocGraph teaming dataset.


  We can save you time, money, and assume the liability associated with a VRDC seat. Our power tool can help:

  1. Pharmaceutical companies understand how to better launch and market their medications.
  2. Payers, Hospitals and ACOs understand how to approach new bundling opportunities.
  3. Providers and healthcare companies see which partners represent opportunities as “taking risk” spreads away from insurers and the government.
  4. And more!

  If you’d like to use our data to make money, improve patient care, and have fun doing it, give us a ring!Edit: DocGraph is now referred to as CareSet Journal.

Fred Trotter

Fred shapes our software development and data gathering strategies, which doesn't stop him from getting elbow-deep in the code on a regular basis. He is co-author of the first Health IT O’Reilly book Hacking Healthcare, and co-creator of the DIRECT protocol mandated in Meaningful Use. Fred’s technical commentary and data journalism work has been featured in several online and print journals including Wired, Forbes, U.S. News, NPR, Government Health IT, and Modern Healthcare.

Connect with CareSet Today

Let's start a conversation to explore how CareSet's comprehensive healthcare data insights can empower your business for data-driven success.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.