A production data platform built on HashiCorp Nomad, running Apache Airflow with Vault-backed connection strings. ETL DAGs ingest from HR, Records, and Service Desk databases into a central warehouse surfaced through Metabase. Automated nightly backups ship to Backblaze B2 — covering every production database.
My Role
DevOps & Data Platform Engineer
Duration
6 weeks · 2025
Context
Ghana School of Law
Outcome
Analytics platform live · 100% database backup coverage · zero credentials in DAG code
Stack
Context
The school had multiple PostgreSQL databases (HR, records, service desk) with no analytics layer and no backup strategy. Business decisions were made without data.
The Pain
No visibility into school operations. Manual reporting took days. Any database server failure would result in permanent, unrecoverable data loss.
Why It Mattered
Student enrollment data, exam records, and HR information with no disaster recovery — a single drive failure away from catastrophic institutional data loss.
Technical Goals
Constraints
Airflow deployed as Nomad services (scheduler, webserver, workers) with Vault-backed connection URIs. A pull-based DAG sync model isolates analyst workflows from infrastructure. Backup jobs run as Nomad periodic tasks — one per database, independent failure domains.
Scroll horizontally on smaller screens to view full diagram
Apache Airflow
DAG orchestration — schedules and executes ETL and backup tasks
HashiCorp Vault
Supplies database connection strings at task execution time — zero credentials stored in Airflow or DAG files
PostgreSQL Sources
HR, Records Management, and Service Desk source databases feeding the ETL pipeline
Metabase
BI dashboard layer surfacing warehouse data to school leadership and operations staff
Backblaze B2
Off-site backup destination — encrypted nightly dumps from every production database
→Pull-based DAG sync from a separate analyst repository
Analysts own the DAG repository independently. A cron job on the Airflow worker pulls from the analyst repo — analysts can update and deploy DAGs without ever needing cluster access.
→Vault-backed connection URIs over Airflow's connections UI
Airflow's connections UI stores credentials in its metadata database. Vault injection means credentials are fetched at task run time — not persisted anywhere in Airflow, not visible in logs.
Airflow scheduler, webserver, and Celery workers deployed as separate Nomad services with Vault-injected secrets.
Incremental ETL DAGs pulling from source databases into a central warehouse schema daily.
Nomad periodic jobs run pg_dump on a nightly schedule, encrypt, and ship to Backblaze B2.
The Problem
Airflow runs database migrations on startup. On Nomad, after a cluster restart, the scheduler and webserver can start simultaneously — both attempt the migration and one fails with a lock conflict.
The Fix
Added a dedicated pre-start migration Nomad job that runs to completion before the main Airflow services. Nomad lifecycle prestart hooks ensure the migration finishes before any Airflow process starts.
Gave the school its first analytics platform and eliminated data loss risk across all production databases.
Before → After
Database Backup Coverage
Analytics Reporting
Credentials in DAG code
Business Outcome
School leadership has live dashboards for enrollment, HR, and service desk operations. Any database can be restored to the previous night within 30 minutes.
Would Do Differently
Key Takeaways
Next Project
Solutions Architecture