# Benchmark Data Requirements

This document defines the validated data inputs required to support the OSUN benchmark suite.

## Validation rules

- Source quality:
  - Government primary source APIs or publications.
  - Peer-reviewed references mapped in `docs/research/t1_corpus.md`.
  - First-party runtime telemetry for operational state.
- Contract quality:
  - Every requirement has cadence, freshness SLA, minimum row count, null ceiling, and typed fields.
  - Numeric fields include explicit lower/upper bounds.
- Coverage quality:
  - Every benchmark task must map to at least one requirement.
  - Every task must have at least two distinct sources.

The machine-readable registry is at:

- `benchmarks/validated_data_registry.json`

The validator and artifacts are at:

- `scripts/validate_benchmark_data.py`
- `reports/benchmark_data_validation.json`
- `reports/benchmark_data_validation.md`

## Task-to-data matrix

| Benchmark task | Requirement ID | Required source groups |
| --- | --- | --- |
| `fermentos_yield` | `fermentos_yield_inputs` | USDA NASS, NOAA, NASA POWER, EIA, first-party telemetry |
| `farmos_output` | `farmos_output_inputs` | USDA NASS, NOAA, NASA POWER, U.S. Census, first-party telemetry |
| `wastezero_covers` | `wastezero_covers_inputs` | USDA ERS, BLS, first-party telemetry |
| `sourcegrid_leadtime` | `sourcegrid_leadtime_inputs` | USDA AMS, U.S. Census, EIA, first-party telemetry |
| `autokitchen_throughput` | `autokitchen_throughput_inputs` | BLS, U.S. Census, first-party telemetry |
| `ghostflow_dispatch` | `ghostflow_dispatch_inputs` | NOAA, EIA, first-party telemetry |
| `rentgrid_occupancy` | `rentgrid_occupancy_inputs` | U.S. Census, BLS, first-party telemetry |
| `recipeproof_compliance` | `recipeproof_compliance_inputs` | FDA additive status, FDA FCS inventory, FDA enforcement, USDA FoodData, first-party telemetry |
| `localsource_packaging` | `localsource_packaging_inputs` | USDA AMS, FDA FCS inventory, EIA, first-party telemetry |
| `localsource_ingredients` | `localsource_ingredients_inputs` | USDA AMS, USDA FoodData, USDA ERS, NOAA, first-party telemetry |

## Operational refresh envelope

| Suite | Cadence target | Freshness max | Null ceiling |
| --- | --- | ---: | ---: |
| AgTech | Daily/weekly | 7-10 days | 2-3% |
| FoodTech | Daily | 3-5 days | 1-2% |
| RestaurantOps | Hourly/daily | 2-7 days | 1-2% |
| LocalSource | Daily | 3-5 days | 2% |

## Benchmark gate policy

Data validation is considered defensible only when all of the following are true:

1. `all_requirements_valid = true`
2. `all_tasks_covered = true`
3. `avg_sources_per_requirement >= 3.0`
4. `median_freshness_days <= 10`
5. `min_sources_across_tasks >= 2`

These are enforced in `scripts/validate_benchmark_data.py --strict` and consumed by `scripts/verify.sh`.
