Data intake
STAGE 1Live odds from The Odds API across 12 retail + sharp books (DraftKings, FanDuel, BetMGM, Caesars, Circa, Pinnacle, plus regional/offshore). Injury feeds from team beat writers + official sources. Weather feeds for outdoor games. Line movement history with timestamps down to the minute.
- ▸12-book odds scraper running every 5 min
- ▸Injury + lineup feeds polled every 15 min on game days
- ▸Weather (wind + precip) for outdoor sports every 30 min
- ▸Historical odds + results archive (5 seasons per major league)
Feature engineering
STAGE 2Raw data becomes ~80 features per game per sport. Pace-adjusted efficiency, rest differential, home/road splits, usage rates, recent form weighted exponentially by recency, schedule density, opponent-adjusted metrics. Feature set is versioned and reproducible.
- ▸80+ per-game features across NFL / NBA / MLB / NHL / NCAAF / NCAAB
- ▸Pace-adjusted (not raw) efficiency numbers per team
- ▸Recency-weighted recent form (last 5 games > last 10 games > last 20)
- ▸Opponent-adjusted variance, not just averages
XGBoost ensemble scoring
STAGE 3Ensemble of gradient-boosted decision trees per sport, trained on 3+ seasons of historical data with walk-forward cross-validation (never train on future, never leak results). Model outputs a win probability and confidence tier for each market per game. Sub-100ms inference.
- ▸Per-sport ensemble (not one-size-fits-all across all leagues)
- ▸Walk-forward CV prevents data leakage
- ▸Out-of-sample test on last season held out at training
- ▸Inference served via lightweight FastAPI, <100ms per prediction
SHAP explainability
STAGE 4Every prediction decomposed into per-feature SHAP values. Users see not just confidence 4 but which factors contributed: +0.8% from rest diff, +0.4% from injury impact, -0.2% from line movement, etc. No black boxes. Users who disagree with the weighting can override with context the model can't see.
- ▸SHAP value computed per pick, stored with the pick record
- ▸Top 5 factors surfaced in the Discord embed
- ▸Full factor breakdown visible in Elite-tier dashboard
- ▸SHAP drift monitoring flags when a factor starts behaving oddly
Closing Line Value tracker
STAGE 5Every posted pick is compared against the final closing number across all 12 books when the game starts. Positive CLV means we beat the market; negative means we didn't. CLV over a 100+ pick window is the single honest indicator of whether the model has edge.
- ▸Settlement job runs nightly across all tracked picks
- ▸Per-sport + per-market + per-confidence-tier CLV aggregated
- ▸Public CLV rolling 30 / 60 / 90 day on Elite dashboard
- ▸Model regress target: +2% average CLV sustained (sharp territory)
Daily retrain feedback loop
STAGE 6Settled results feed into the training set. Model retrains on a rolling window (weekly full retrain, daily incremental for fresh data) so tomorrow's predictions are informed by yesterday's results. Over 90 days the model measurably sharpens against the closing line.
- ▸Weekly full retrain with updated feature importances
- ▸Daily incremental fine-tune with the most recent game nights
- ▸A/B shadow models evaluated before any live swap
- ▸Rollback automatic if live CLV drops 0.5+ points vs prior version over 50 picks