Data Inventory & Feature Map

What this page is for

This page documents:

  • What data we store for each team and season
  • Where it comes from (data sources and ingestion workflows)
  • Which models and pages use each data block
  • How to inspect team data using the CLI tool

Use this as a reference when adding new data sources, understanding what's available for model development, debugging missing or incorrect data, or planning new features.

Game & Market Data

Sources:

  • CFBD: Game schedules, results, scores, game times
  • OddsAPI: Primary odds source (spreads, totals, moneylines)
  • SGO: Backup odds source (used when OddsAPI unavailable)

Tables:

  • games: Game records with teams, dates, scores, locations
  • market_lines: Betting odds from various sources (OddsAPI, SGO)
  • bets: Generated picks from models (Hybrid V2, V4 Labs, etc.)

Used by:

  • Current Slate (/picks): Shows upcoming games with model picks and odds
  • Week Review (/weeks/review): Historical performance by week
  • Season Review (/season-review): Season-long profitability analysis
  • Labs (/labs): Experimental overlays (V4, Fade V4)
  • Backtests: Strategy performance analysis

Core Team-Season Efficiency (CFBD)

Source: CFBD Advanced Stats API (aggregated season-level stats)

Table: team_season_stats (numeric columns)

Fields:

Offense:

  • ypp_off: Yards per play
  • success_off: Success rate
  • pass_ypa_off: Passing yards per attempt
  • rush_ypc_off: Rushing yards per carry
  • pace_off: Plays per game (tempo)
  • epa_off: Expected points added

Defense:

  • ypp_def: Yards per play allowed
  • success_def: Success rate allowed
  • pass_ypa_def: Passing yards per attempt allowed
  • rush_ypc_def: Rushing yards per carry allowed
  • pace_def: Opponent plays per game
  • epa_def: Expected points added (defense, negative is better)

Used by:

  • Hybrid V2: Core spread model uses these efficiency metrics
  • Ratings calculations: Power ratings and implied lines
  • Matchup analysis: Team strengths/weaknesses comparisons

Drive-Based Metrics (CFBD Drives → raw_json.drive_stats)

Source: CFBD Drives API (play-by-play drive data)

Table: team_season_stats.raw_json.drive_stats

Fields:

  • Tempo: Average seconds per drive, quality drives, quality drive rate
  • Finishing Drives: Scoring opportunities, points per opportunity (offense/defense)
  • Available Yards: Average yards available, yards gained, available yards percentage (offense/defense)

Used by:

  • V4 Labs: Drive-based spread model (experimental)
  • Future V5: Potential integration into Hybrid model
  • Labs overlays: Drive efficiency analysis

Talent, Recruiting & Roster Churn (raw_json.roster_churn)

Source: CFBD Returning Production + Transfer Portal APIs

Table: team_season_stats.raw_json.roster_churn

Fields:

  • Returning Production: Percentage of production returning by position group (offense/defense, QB, RB, WR, OL, DL, LB, DB, etc.)
  • Transfer Portal: Transfers in, transfers out, net transfers

Used by:

  • Portal & NIL Indices (V5 - Planned): Will feed Continuity Score, Positional Shock, Mercenary Index, Portal Aggressor
  • Future Labs overlays: Roster stability analysis
  • Preseason adjustments: Power rating adjustments based on roster turnover

Labs / Experimental Blocks (raw_json.sgo_stats)

Source: SportsGameOdds API (curated stats)

Table: team_season_stats.raw_json.sgo_stats

Fields:

  • Red Zone: Trips, touchdowns, touchdown rate
  • Penalties: Count, yards, first downs, per-game rates
  • Pressure/Havoc: Sacks, TFLs, QB hits, INTs, fumbles (offense/defense)
  • Special Teams: Punting, returns, field goals
  • Game Script: Largest lead, seconds in lead, lead changes, scoring runs, ties

⚠️ Labs-only, optional

  • Not used in production models (Hybrid V2, ratings)
  • Used for future V5 model development
  • Safe to ignore if SGO plan is disabled
  • May be deprecated if not proven useful

Portal & NIL Meta Indices (V5 – Planned)

🚧 Not yet implemented (stubs exist in apps/jobs/src/talent/portal_indices.ts)

These four indices will be computed from raw_json.roster_churn data and used as Labs overlays first, then potentially integrated into V5 Hybrid model.

1. Continuity Score

Data Source: CFBD returning production + transfer portal net counts

Intended Use: Labs overlay to identify teams with high/low roster stability; if stable, may adjust Hybrid V2 confidence or power rating adjustments in V5.

2. Positional Shock Index

Data Source: Position-group breakdowns from roster_churn.returningProduction

Intended Use: Labs overlay to flag teams with extreme turnover at key positions (QB, OL, DL); may inform matchup-specific adjustments in V5.

3. Mercenary Index

Data Source: 1-year transfers and short-eligibility players from transfer portal data

Intended Use: Labs overlay to identify teams heavily reliant on transfers; may adjust for chemistry/cohesion factors in V5.

4. Portal Aggressor Flag

Data Source: Net talent gain from transfers (transferPortal.netCount + talent ratings if available)

Intended Use: Labs overlay to flag teams that aggressively use the portal; may inform power rating adjustments in V5 if portal-heavy teams show consistent patterns.

Team Data Inspector (CLI)

Script: scripts/inspect-team-data.ts

Usage:

npx tsx scripts/inspect-team-data.ts --season 2025 --team lsu npx tsx scripts/inspect-team-data.ts --season 2024 --team "Ohio State"

What it prints:

  • Header: Season, team name, slug, teamId
  • Core efficiency: All numeric columns from team_season_stats
  • Drive stats: Contents of raw_json.drive_stats (if present)
  • Roster churn: Contents of raw_json.roster_churn (if present)
  • SGO stats: Contents of raw_json.sgo_stats (if present)
  • Portal indices: Contents of raw_json.portal_meta (if present, future)

Why it's useful:

  • Quick verification that data exists for a team/season
  • Debug missing or incorrect data
  • Understand what's available before writing model code
  • Compare data across teams/seasons

Adding New Data Blocks

When adding new data to team_season_stats.raw_json:

  1. Document it here in the appropriate section
  2. Update workflows-guide.md with ingestion workflow details
  3. Update inspect-team-data.ts to display the new block
  4. Add a stub or implementation in the relevant module (e.g., portal_indices.ts for portal/NIL data)

Related Documentation