vectors · deployment · latency
Canary routing for vector index swaps
Elias Cho · 2025-06-03
Swapping a dense vector index at midnight is tempting until you realize your autoscaler has never seen the new embedding dimension mix. We teach a canary lane that routes a deterministic hash of session IDs through the candidate index while keeping a shadow comparator on latency and recall@k sampled from held-out queries.
The approach assumes you already log query text with consent boundaries respected. If logging is sparse, the canary becomes a toy; we spend time upfront aligning with privacy counsel on redaction hooks. Week three of the course adds a budgeted load generator that replays anonymized queries so the canary is not entirely dependent on organic traffic spikes.
In production, we document abort criteria: if P99 latency crosses a negotiated multiple of baseline for more than two windows, rollback is automatic and the incident channel receives a pre-filled template. The final paragraph emphasizes that canaries are not a substitute for offline evaluation—they catch integration mistakes, not semantic regressions that only appear in tail queries.