Akros · Accuracy · Methodology

How we score Akros against weighed-truth meals.

The full version of what /accuracy summarises. Methodology revisions are versioned at the bottom of this page.

1. Datasets

The weekly run uses two public labelled datasets:

•Snap-It / Food-101 / Recipe1M+ for plate-photo accuracy. Each image has a weighed-grams label per ingredient, normalised to USDA SR-Legacy macros. Excludes images with multiple plates or partial occlusion (~6% drop).

•Nutritionix daily-log slice for free-text-described meals where users weighed their own food. Excludes log entries where the description includes > 4 components (the labelling noise floor on free-text logs jumps sharply past 4).

We do not include the user's own logs in this benchmark. A user-generated training-set leak is the most common way published-accuracy numbers get gamed; ours is air-gapped from production traffic.

2. Scoring

Primary metric: Mean Absolute Error of the kcal estimate against the labelled truth, reported in percent. For multi-component meals the per-component errors are summed before MAE is computed (we do not allow positive and negative component errors to cancel).

Secondary metrics (logged but not headlined on the public page): per-macro MAE (protein / carbs / fat in grams), median per-meal latency, median per-meal model cost. We publish only the headline kcal MAE on the public page because it is the only number users actually care about; the rest live in the internal dashboard.

We use median, not mean, for latency and cost — a single 30-second outlier from a slow vision-model run would distort the mean and is not representative of typical UX.

3. Exclusions

A meal is excluded from the run only for one of three reasons, each logged in the run's notes field:

•The image fails our pre-flight validation (corruption, EXIF strip, <512px).

•The labelled truth lists ingredients not present in USDA / AUSNUT (~0.4%).

•The pipeline returns an HTTP 5xx after the configured 3-attempt retry. We do not silently drop these — the failure rate is reported as a separate number alongside MAE.

We do not exclude meals because Akros performed poorly on them. The temptation is real; the rule above is the discipline.

4. Competitor comparisons

We compare Akros only to vendors that publish per-meal accuracy data. If a vendor only publishes a single MAE figure without dataset disclosure, we cite that figure (with a source link on the headline page) and do not run a head-to-head.

The Cal AI 20-50% MAE range is from an independent journalist test (linked on the headline page). We do not have a per-meal Cal AI distribution to score against ours, and we will not fabricate one.

When a comparable vendor publishes per-meal accuracy data, we will add them to the weekly run with their dataset and our matched filter. Until then, we cite ranges.

5. Confidence intervals

Each weekly MAE is reported with a 95% bootstrap confidence interval over 1,000 resamples of the meals included in that run. The interval is not shown on the headline page (it would clutter the table for the typical user) but is available in the accuracy_runs.confidence_interval_pct column on request.

When the interval crosses our prior week's interval, we describe the change as "within noise" rather than as a regression or improvement.

6. What this page is not

Not a peer-reviewed publication. Not a regulated medical-device validation. Not a claim that Akros's calorie estimate is suitable for clinical decision-making. The point of the public benchmark is to expose changes in our own pipeline to the people using it — not to support a clinical assertion.

If you need clinical-grade nutrition tracking, you should be weighing your food and using a registered dietitian. We say this on every paid screen and we say it here too.

7. Revisions

v1 · 14 May 2026 · initial publication.

Akros is a personal wellness app. It is not a medical device, does not provide medical advice, and is not a substitute for consultation with a licensed clinician.