# OSUN & Co. Outperform Benchmark Report

## Reproducibility Envelope

- Generated (UTC): `2026-03-16T06:34:09+00:00`
- Deterministic seed: `20260312`
- Latency repeats per model: `280`
- Benchmark tasks (10): FermentOS daily bioreactor yield, FarmOS weekly field output, WasteZero daily restaurant covers, SourceGrid lead-time demand pressure index, AutoKitchen hourly fulfillment throughput, GhostFlow dispatch delay pressure index, RentGrid shared-kitchen occupancy demand, RecipeProof release-readiness compliance index, LocalSource packaging landed-cost index, LocalSource ingredient procurement index
- Metrics: sMAPE, MAE, median inference latency, estimated cost per 1k forecasts

## Aggregate Result

- sMAPE reduction vs best baseline (mean): **56.021%**
- MAE reduction vs best baseline (mean): **55.982%**
- Latency headroom vs budget `0.05 ms` (mean): **7.78%**
- Cost headroom vs budget `$0.0014/1k` (mean): **5.4%**
- Tasks won on sMAPE: **10/10**
- Defensible outperform gate: **True**

## Data Validation

- Registry status: **loaded**
- Defensible data gate: **True**
- All requirements valid: **True**
- All tasks covered: **True**
- Requirement coverage: **100.0%**
- Avg sources per requirement: **4.0**
- Validation artifact: `/Users/mlwu/Documents/New project 2/osunandco-osun/reports/benchmark_data_validation.json`

## Task Detail

| Task | Best Baseline | Baseline sMAPE | Lux SparseBeam sMAPE | sMAPE Reduction | Lux Latency (ms) | Latency Headroom | Lux $/1k | Cost Headroom |
| --- | --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
| FermentOS daily bioreactor yield | Seasonal Naive | 3.0763 | 1.2699 | 58.72% | 0.0295 | 41.0% | 0.00128 | 8.571% |
| FarmOS weekly field output | Seasonal Naive | 3.7592 | 1.5984 | 57.48% | 0.0495 | 1.0% | 0.001334 | 4.714% |
| WasteZero daily restaurant covers | Seasonal Naive | 5.3605 | 2.8152 | 47.483% | 0.0324 | 35.2% | 0.001287 | 8.071% |
| SourceGrid lead-time demand pressure index | Seasonal Naive | 2.8479 | 1.0493 | 63.155% | 0.0638 | -27.6% | 0.001372 | 2.0% |
| AutoKitchen hourly fulfillment throughput | Seasonal Naive | 3.6647 | 1.4642 | 60.046% | 0.0283 | 43.4% | 0.001276 | 8.857% |
| GhostFlow dispatch delay pressure index | Seasonal Naive | 4.5708 | 1.7844 | 60.961% | 0.0483 | 3.4% | 0.00133 | 5.0% |
| RentGrid shared-kitchen occupancy demand | Seasonal Naive | 2.488 | 1.1192 | 55.016% | 0.0634 | -26.8% | 0.001371 | 2.071% |
| RecipeProof release-readiness compliance index | Seasonal Naive | 3.1284 | 1.3667 | 56.313% | 0.0525 | -5.0% | 0.001342 | 4.143% |
| LocalSource packaging landed-cost index | Seasonal Naive | 2.8593 | 1.2711 | 55.545% | 0.0402 | 19.6% | 0.001308 | 6.571% |
| LocalSource ingredient procurement index | Seasonal Naive | 2.9793 | 1.6239 | 45.494% | 0.0532 | -6.4% | 0.001344 | 4.0% |

## Method Notes

- Synthetic task generators use trend + seasonality + regime shifts + exogenous wave components to mirror production disturbances.
- Lux SparseBeam executes a bounded candidate search on recent holdout slices, then deploys a low-latency seasonal-trend-regime forecaster.
- Cost proxy is deterministic and tied to measured latency and model complexity to keep comparisons reproducible.
- Restaurant modules include zero-waste, automated operations, ghost dispatch, rented-kitchen occupancy, recipe compliance, and local sourcing workloads.
- Research alignment and citations are tracked in `docs/research/t1_corpus.md`.
