orchestration · lineage · dag
When DAG simplicity beats clever parallelism
Camille Dubois · 2025-01-15
Fan-out feels like free throughput until 3 a.m. when a single bad partition blocks promotion of a dataset every downstream team needs. We teach a litmus test: if you cannot explain the rollback order on a whiteboard in two minutes, your DAG is too clever for the on-call rotation you actually have.
That does not mean serial pipelines forever. It means explicit stages with contracts—schema checkpoints, row count floors, and owner tags on each node. We model a staged failure where an upstream vendor ships duplicate keys; the DAG should fail loudly at the contract gate instead of silently duplicating training rows.
We close with a note on observability: lineage events are cheap compared to retraining. Emitting OpenLineage-compatible payloads on every materialization pays off the first time finance asks which revenue numbers included a late-arriving refund table. The habit is boring; the incidents avoided are not.