About MUSC CEDAR Core

MUSC Comparative Effectiveness Data Analytics Research (CEDAR) Core

The MUSC Comparative Effectiveness Data Analytics Research (CEDAR) Core is the repository and custodian for several large de-identified billing records (non-clinical) data bases: 1) the Healthcare Cost and Utilization Project (HCUP) distributed by the Agency for Healthcare Quality and Research; 2) the MarketScan® Commercial Payer, MarketScan-Medicare Supplemental Insurance, and 10-state sample of MarketScan-Medicaid data set which is licensed to MUSC by Truven Analytics, as well as the Medicare Limited 5% sample licensed by CMS. The data are available for faculty and staff use under varying conditions as specified in their data use agreements (DUAs).

These “mother” data sets are de-identified and used for many “Big Data” pilot studies as well as for investigator-initiated unfunded studies and for medical resident research. In many cases use of these data have been deemed “Non-Human Research” by the MUSC IRB. However, all research request, access to data, and final research products from these data must be approved by CEDAR Directors to ensure DUA compliance.

Most DUAs prohibit linkage with other data bases, reporting of cell sizes less than 11 observations. The data do not contain any of the 18 HIPAA codes in a form that allows patients or providers to be identified. Certain data require application for re-use or collaboration with CEDAR researchers on on-going approved CEDAR projects.

The SC HCUP data sets contain all SC hospital admissions, ED visits and outpatient surgery UB04 records. They have county codes and race indicators, but has age aggregated into age groups, which meet the criteria for de-identified data by the SC Office of Research and Statistics (years 2014-2020). HCUP National Inpatient (NIS) and ED (NED) Samples are available under specific DUAs and may be approved by AHRQ for reanalysis. These data are available for 1998-2019 for the NIS and for limited years for the ED data. HCUP State Inpatient data (SID) and ED data (SED) is available for approximately 15-20 states for various years ranging from 2010-2019. These data may require AHRQ approval for reanalysis.

The MarketScan® data contain hospital discharge dates, but the data is an approximately 10% sample of the total US commercially insured population, which meets the criteria for de-identification based on statistical sample size. These data do not contain any indicators of race and no county or zip code variables. All ages for patients over age 89 are coded as >89 and variables for hospital discharge destination have been aggregated by the data vendor to assure that the data meets the HIPAA requirements. In addition, discharge destination “death” or “Incarceration” are blanked out as an extra safeguard to assure that individuals cannot be identified (years 2016-2020).

Medicare Part A and Part B Limited Data Set 5% Sample data for 2010-2019 is available for hospital admissions, ED visits, outpatient clinics, lab tests and procedures. However, no prescription files are available. The data is representative of US patients and contain variables on age, race, sex, dual eligible for Medicaid and county of residence. The use of these data is guided by DUAs.

The CEDAR core has a large portfolio of validated SAS programs for selecting disease phenotypes (e.g. stroke, MI, diabetes, Alzheimer’s, asthma, HIV) and prescription bundles (depression meds, blood pressure med, biotics etc), as well as validated programs to assign comorbidity risk (Charlson and Elixhouser), frailty measures, stroke severity. Consultations on study design is free, and assistance with data extraction, programming and statistical modeling is available for a fee.