Case Study
Customized Data Warehouse Streamlines Operations, Improves Accuracy
- Customer: Large academic medical center on the East Coast
- Challenge: Replace multiple outdated, slow-to-load databases with one highly customized data warehouse
Results
- Established Epic Caboodle data warehouse as the primary data source to facilitate larger custom datasets
- Streamlined operations with easier data loading, next-day data access, and simplified workflows
- Improved accuracy with change data capture and an audit balance control framework
Background and Challenge
To provide researchers and report writers with easy access to high-quality data, a large academic medical center on the East Coast requested a metadata-driven extract, load, and transform (ELT) process to stand up an Observational Medical Outcomes Partnership (OMOP) data warehouse using the OMOP Common Data Model (CDM). This provides insights generated using Observational Health Data Sciences and Informatics (OHDSI) standardized tools.
Solution
Concept Mapping and Data Population
Tegria deployed an integrated team of data engineers, Epic data experts, and a data architect to work side by side with the internal teams and set the project up for success. Using Caboodle’s source data, the project team completed source-to-target mapping to the CDM. Tegria created profiles of the source data to allow senior research data analysts to make informed decisions about conforming Caboodle data to the OMOP standard and extending the CDM to include additional data elements from both Caboodle and Clarity databases.
The project team selected a metadata-driven ELT architecture to minimize development time and long-term maintenance costs. Business Intelligence Markup Language (Biml) was used in the creation of efficient processes to automatically generate ELT packages for exporting and loading data.
Staging and Validating Source Data
Once the scripts were created, data analysts staged and validated source data. Loading the legacy data helped to ensure that the system was configured to work with the existing implementation and informed the development of features that would enable efficient, scalable architecture.
Large datasets can present performance challenges, so it was important to have an ELT process to load data daily to a staging database along with a persisted source database, OMOP formatting, and ultimately a load into the reporting database. A business rule framework was created to perform a variety of data quality checks and validation across multiple tables.
Improving Efficiency
Previously, databases were loaded infrequently due to the large amounts of data, slow loading times, and conflicting reporting times. To solve this issue moving forward, Tegria recommended a multi-threaded approach to run multiple data packages at the same time with built-in prioritization for size and timing.
To provide efficient access to the data, a dynamically generated set of user access views (UAVs) was created daily for multiple access modes including the standard, protected health information (PHI), and de-identified data models, as well as patient cohorts defined by a researcher. To help facilitate the variety of data models, a complex code that would de-identify dates, elderly patients, and locations was developed. This also included record-level exclusion flags to identify records that cannot be included in views, such as patients with a special status (i.e., celebrities, employees, etc.), Community Connect patients, HIV patients, and substance abuse patients.
Tegria data engineers also developed and deployed a metadata-driven flat file loader that leverages both dynamic SQL, including tables and stored procedures, and Biml to generate the SSIS packages. Source data is preserved in the persisted source layer, and the process handles file archiving and high-level execution auditing. The generated stored procedures conform to the project’s general design specification and can be configured as type 1 (no change history retained) or type 2 (change history retained through versioning).
Results
- Established Epic Caboodle data warehouse as the primary data source to facilitate larger custom datasets
- Streamlined operations with easier data loading, next-day data access, and simplified workflows
- Improved accuracy with change data capture and an audit balance control framework