Data Pipelines

Pipeline Orchestration Forge

DAG design, idempotent transforms, and lineage-friendly orchestration for batch + incremental ML pipelines.

Duration
4 weeks · online
Format
Async + live reviews
Level
Intermediate
Tuition (informational)
₩1,100,000
Pipeline Orchestration Forge

Program narrative

We implement DAGs with explicit SLAs, data quality gates, and rollback semantics. Labs include a simulated upstream schema change on a Friday afternoon.

What is included

  • · DAG layout rubric emphasizing fan-out safety
  • · Lineage capture with OpenLineage-compatible events
  • · Data quality contracts with soft vs. hard fails
  • · Blue/green dataset promotion checklist
  • · Cost attribution tags per pipeline node
  • · Incident runbook starter pack
  • · Mentor pairing on your DAG sketch

Outcomes you can demo

  • · Promote a dataset version with reversible steps
  • · Instrument lineage events consumable by your catalog
  • · List rollback owners and expected recovery time

Mentor of record

Camille Dubois

Camille Dubois

Former SRE turned ML platform lead; advocates for boring, readable DAGs.

Participant questions

Airflow vs Dagster?

We teach both at orchestration concepts level; you pick one for the capstone with mentor approval.

Snowflake required?

Labs default to Snowflake-compatible SQL; you may adapt to BigQuery with extra self-led work.

Out of scope?

Real-time streaming orchestration belongs in the Streaming Data Mesh Lab, not here.

Recent participant notes

“Pipeline Orchestration Forge’s blue/green dataset checklist is taped above my monitor. Boring in the best way.”
— Owen · Data platform · 5/5 · survey