We collaborated with a prominent client in the entertainment industry to develop a robust data warehouse solution, designed to handle extensive amounts of sensor and machine-generated data from multiple MySQL databases and tables, ultimately delivering actionable insights and improved operational efficiencies.
Our approach involved implementing ETL pipelines using Apache Airflow to automate and orchestrate the process, eliminating manual steps. This enabled continuous, streamlined migration, transformation, and automated deletion of data from MySQL into BigQuery, allowing the client to add new tables or columns seamlessly as their data requirements grew.
The solution included an incremental loading model and partitioned tables within BigQuery to optimize data retrieval, which significantly enhanced the performance of queries and minimized processing times. The project not only helped the client extract and utilize actionable insights in real-time but also improved the efficiency of their laser tag systems, ultimately delivering a better guest experience.
Before Implementation
Managing large data volumes across multiple MySQL databases was labor-intensive, with limited support for schema changes.
Data migration to BigQuery was slow and inconsistent, hindering access to insights.
MySQL storage filled quickly, with manual deletions adding inefficiency and risking data backlog.
Key Improvements
Streamlined ETL workflows via Airflow, allowing for efficient data movement, backups, and deletion processes.
Utilized partitioned tables and incremental data loads, enhancing query performance.
Implemented robust data validation and backup via Kafka and Delta Portal Backup Stream, ensuring accuracy in deletions.
Project Overview
1.
Pipeline Development
Designed and implemented ETL pipelines using Apache Airflow, orchestrating data extraction from MySQL and storing it temporarily in Google Cloud Storage for streamlined processing.
2.
Data Migration
Loaded historical data with a full-load pipeline, then transitioned to an incremental pipeline for timely and efficient data updates, optimizing storage and retrieval.
3.
Data Deletion
Developed a deletion pipeline that periodically removes selected data from MySQL after validating successful ingestion in BigQuery, reducing storage costs.
4.
Backup and Monitoring
Activated Google Data Stream to track all MySQL changes in real-time, enabling comprehensive backup and monitoring for data integrity across the pipelines.
Airflow
MySQL
DataStream
Bigquery
Technologies used
Doug Willems
CEO - Delta Strike International
"This solution has transformed how we handle data. With automated ETL processes and a highly scalable system, we can seamlessly access critical insights and optimize our operations. The automated data deletion and validation give us confidence in our data integrity, and the efficiency gains are invaluable. We couldn't have asked for a better partner to bring this project to life."