Docs·Concepts·Calibration

How Goable calibrates to your spots

Physics gives you a starting score; operator outcomes bend it to reality. Every spot in Goable can graduate from a generic base profile to a fully calibrated curve once enough outcomes accumulate.

Why calibration matters

A generic "kitesurfing" profile knows that 18 knots is workable for an intermediate. But a Tarifa school may run sessions at 16 knots while a windless Catalan cove cancels them. Same physics, different local thresholds. Calibration learns these from submitted outcomes.

The calibration loop

You score a session via /v1/score. The response carries a sessionId.
The session runs (or doesn't). You report what happened via POST /v1/score/:id/outcome with outcome_type ran / cancelled / no_show / rescheduled.
A weekly Bayesian refit pipeline trawls the accumulated outcomes per-(activity × cell × dimension) and produces a candidate curve.
A held-out Brier Skill Score validation gate decides whether the new curve beats the previous one. If yes, it ships; if no, the previous curve persists. Operators are never quietly served a worse model.
The drift monitor runs daily on the deployed curve. If skill drops two sigmas below baseline, an emergency recalibration triggers mid-week.

Hierarchical fallback

New spots don't have outcomes yet. The engine uses a 5-level spatial hierarchy (base, region, cluster, sub-spot, micro) with Bayesian shrinkage — your new spot borrows its parent's calibrated curve until it has its own ≥150 paired outcomes. The calibration_provenance block in every score response tells you which level was hit:

Sub-spot resolved with n≥100 outcomes yields hierarchical_calibration = 1.0. Falling all the way back to the base profile yields 0.85. Same scoring engine, fewer guarantees on local accuracy.

Cadence + thresholds

Weekly batch runs Sunday 02:00 UTC. A cell becomes eligible for calibration once it has ≥150 paired outcomes; below that the catalog's prior carries. Every batch ships a markdown audit (visible to ops at /ops/calibration) with the BSS Δ, drift safeguards passed, and the scoring sample size.

Inspecting calibration for your traffic

Tenants on Pro+ can see the verification dashboard at the verification page in your tenant portal — Brier Skill Score + reliability diagrams stratified by horizon, sub-spot and cluster. The numbers come from the same scoring_audit_log × outcomes join the drift monitor watches.