Data Pipelines
Pipeline Orchestration Forge
DAG design, idempotent transforms, and lineage-friendly orchestration for batch + incremental ML pipelines.
- Duration
- 4 weeks · online
- Format
- Async + live reviews
- Level
- Intermediate
- Tuition (informational)
- ₩1,100,000
Program narrative
We implement DAGs with explicit SLAs, data quality gates, and rollback semantics. Labs include a simulated upstream schema change on a Friday afternoon.
What is included
- · DAG layout rubric emphasizing fan-out safety
- · Lineage capture with OpenLineage-compatible events
- · Data quality contracts with soft vs. hard fails
- · Blue/green dataset promotion checklist
- · Cost attribution tags per pipeline node
- · Incident runbook starter pack
- · Mentor pairing on your DAG sketch
Outcomes you can demo
- · Promote a dataset version with reversible steps
- · Instrument lineage events consumable by your catalog
- · List rollback owners and expected recovery time
Mentor of record
Camille Dubois
Former SRE turned ML platform lead; advocates for boring, readable DAGs.
Participant questions
Airflow vs Dagster?
We teach both at orchestration concepts level; you pick one for the capstone with mentor approval.
Snowflake required?
Labs default to Snowflake-compatible SQL; you may adapt to BigQuery with extra self-led work.
Out of scope?
Real-time streaming orchestration belongs in the Streaming Data Mesh Lab, not here.
Recent participant notes
“Pipeline Orchestration Forge’s blue/green dataset checklist is taped above my monitor. Boring in the best way.”