The increasing use of technology in health has enabled collection and storage of data at an exponential rate. Data is at the heart of “Health Analytics”, where it can be used to understand the current system, project the impact of changes, or evaluate and intervention. Data in healthcare data are scattered and fragmented, available from various sources, governmental or non-governmental health organizations, commercial or non-commercial, national wide or state wide; recorded at various aggregation temporal levels, event-level, monthly or yearly, and at various aggregation spatial levels, census tract, zip code, county, state or national; and with various levels of accuracy. Some data can be called “Big Data”, which is characterized by large Volume (from Terabytes to Exabytes) and Complexity (heterogeneity, depth, dimensionality, dependencies).
In our work, we use a variety of data sources (often combined), along with robust methodological tools from the mathematical sciences to move along a continuum from Information to Data to Knowledge to Decisions. Data can be useful in understanding 'Who, Where, What, or When' in the system.
Medical Claims Data
Claims data consist of person-level data on eligibility, service utilization and payments. These data can be used to understand careflow, utilization and cost at the system, organization, provider and patient levels. A central, large repository of claims data is the Center for Medicare and Medicaid Services (CMS), including claims for all Medicare and Medicaid-insured patients across multiple years. These data are developed to support research and policy analysis initiatives for Medicaid and other low-income populations such as analyzing provider payments, conducting quality or access to care studies, and conducting statistical analysis for public reporting.
Electronic Health Records
EHRs are integrated medical, clinical, administrative and patient-detailed records that could be accessed readily at a wellness visit, as well as in an ER visit, without breaching privacy and confidentiality. Currently, only a few countries adopted centralized EHR systems. In 2004 President George W. Bush established a national goal of universal adoption of electronic health records and health information exchanges by 2014 although to-date the EHR system is fragmented, and highly varying in levels of information, interoperability and accessibility from one health organization to another. One example that could be used as a benchmark for integrated EHR’s in the U.S. is the computer system connecting pharmacies with providers in the US. Nearly all pharmacies connect electronically to health plans when they enter a patient’s prescription into their computer system.
Organizations like the Cystic Fibrosis Foundation have established disease registries, which contain patient-level data across many years of service at accredited Cystic Fibrosis clinics. Disease registries are useful in understanding specific diseases and patient outcomes over time.
National & State Databases
Many exist, but most commonly researched databases are:
- Behavioral Risk Factor Surveillance System (BRFSS) includes some state-level information
- Hospital Cost and Utilization Project (HCUP) captures information about utilization of services within hospitals for participant states
- Medical Expenditure Panel Survey (MEPS) is useful for understanding expenditures by patients over time
- National Child Health Survey contains a large sample of children
- National Health and Nutrition Examination Survey (NHANES) contains examination data on patients in addition to survey elements
- National Provider Index from CMS contains information on all providers of medical care in the US along with locations and taxonomy codes for specialties
- National Survey of Children's Health (NCSH) provides information for physical and mental health status, access to quality health care, as well as information on the child’s family, neighborhood and social context.
- The OASIS system is available online for understanding hospital visits in the state of Georgia
- The Framingham Heart Study was started in 1948 under the direction of the National Heart Institute. Its initial purpose was to identify common factors that contributed to the onset and progression of cardiovascular disease. Later on, the data from the study came to be used for many other studies and analyses.
- The Wisconsin Diabetes Registry Study was funded by the National Institute of Diabetes, Digestive, and Kidney Diseases (part of the National Institutes of Health) starting in 1987 to understand the complications and co-morbidities associated with diabetes.
Other Data Sources
- Census Bureau data and the American Community Survey are invaluable for Health Analytics
- National Center for Biotechnology Information is a genomics community providing many genome databases
- Medical technologies (e.g. EEG, CT scan, MRI) are widespread sources of monitoring and diagnosis patient data
- Patient-generated data (e.g., self-reported, tracked through mobile devices, virtual communities) are becoming more common
- ICD-9 Codes are diagnosis codes that provide information on the primary and secondary conditions associated with a healthcare visit
- Diagnosis Related Groups (DRGs) characterize patients by the expected utilization of resources and they are used by many payors to reimburse providers.
- Many others, that we do not describe here