How Goable calibrates to your spots
Physics gives you a starting score; operator outcomes bend it to reality. Every spot in Goable can graduate from a generic base profile to a fully calibrated curve once enough outcomes accumulate.
Why calibration matters
A generic "kitesurfing" profile knows that 18 knots is workable for an intermediate. But a Tarifa school may run sessions at 16 knots while a windless Catalan cove cancels them. Same physics, different local thresholds. Calibration learns these from submitted outcomes.
The calibration loop
- You score a session via
/v1/score. The response carries asessionId. - The session runs (or doesn't). You report what happened via
POST /v1/score/:id/outcomewithoutcome_typeran / cancelled / no_show / rescheduled. - A weekly Bayesian refit pipeline trawls the accumulated outcomes per-(activity × cell × dimension) and produces a candidate curve.
- A held-out Brier Skill Score validation gate decides whether the new curve beats the previous one. If yes, it ships; if no, the previous curve persists. Operators are never quietly served a worse model.
- The drift monitor runs daily on the deployed curve. If skill drops two sigmas below baseline, an emergency recalibration triggers mid-week.
Hierarchical fallback
New spots don't have outcomes yet. The engine uses a 5-level spatial hierarchy (base, region, cluster, sub-spot, micro) with Bayesian shrinkage — your new spot borrows its parent's calibrated curve until it has its own ≥150 paired outcomes. The calibration_provenance block in every score response tells you which level was hit:
Sub-spot resolved with n≥100 outcomes yields hierarchical_calibration = 1.0. Falling all the way back to the base profile yields 0.85. Same scoring engine, fewer guarantees on local accuracy.
Cadence + thresholds
Weekly batch runs Sunday 02:00 UTC. A cell becomes eligible for calibration once it has ≥150 paired outcomes; below that the catalog's prior carries. Every batch ships a markdown audit (visible to ops at /ops/calibration) with the BSS Δ, drift safeguards passed, and the scoring sample size.
Inspecting calibration for your traffic
Tenants on Pro+ can see the verification dashboard at the verification page in your tenant portal — Brier Skill Score + reliability diagrams stratified by horizon, sub-spot and cluster. The numbers come from the same scoring_audit_log × outcomes join the drift monitor watches.