notebooks · reproducibility · training

Notebook hygiene before you invite GPUs

Haneul Min · 2024-11-20

Hands adjusting patch cables under soft desk lighting

Shared GPU images rot quickly when every participant pins a different minor version of a scientific stack. We enforce a base image hash per cohort and require pull requests for dependency bumps with a short rationale. That sounds bureaucratic until you watch a class unblock in minutes because everyone is literally on the same CUDA driver surface.

We also ban silent notebook outputs in shared repositories. Clearing outputs before commit is not about tidiness—it prevents megabyte-sized JSON blobs from masking real diffs in review. Instructors model this by walking through a deliberately broken notebook where a hidden output cached an old dataframe schema.

The last section covers collaboration etiquette: naming cells after hypotheses, attaching a one-line expected behavior comment, and linking to experiment IDs in the registry. These habits transfer directly to production notebooks that feed scheduled jobs, which is the point—we are rehearsing seriousness before budgets climb.

← All field notes