Constrained YOLO Eval

Run configuration

Checkpoint — prototypes/synthetic_objects_yolo/outputs/phase3/runs/final_5view/weights/best.pt
Confidence threshold — 0.25
Frame cap per clip — 8
Constraint source — /data/assemble-anything/step_part_map.json
Raw JSON — https://workspace.hadsed.com/assemble-anything/reports/constrained-yolo-eval-2026-04-06-results.json

Steps with full coverage

2/9

Steps with partial coverage

6/9

Steps with no coverage

1/9

Top raw false positive

18986

126 detections

Per-step summary

Step	Status	Expected parts	Hit frames	What actually hit	Missing
1 step_001_correct_0.mov	partial	30029, 3023	5/7	30029 (5/7)	3023
2 step_002_correct_1.mov; step_002_corrected_2.mov	partial	35480, 36841	2/14	36841 (2/14)	35480
3 step_003_correct_3.mov	none	20482	0/7	none	20482
4 step_004_correct_4.mov	partial	3710, 99781	3/7	99781 (3/7)	3710
5 step_005_correct_5.mov	full	3023, 3666	4/7	3023 (2/7), 3666 (4/7)	—
6 step_006_correct_6.mov	full	36840	3/7	36840 (3/7)	—
7 step_007_correct_7.mov	partial	3022, 3710	7/7	3022 (7/7)	3710
8 step_008_correct_8.mov	partial	28802, 3024	1/7	28802 (1/7)	3024
9 step_009_correct_9.mov	partial	4032a, 99780	1/7	4032a (1/7)	99780

Ranked actionable error analysis

Highest-impact misses to fix first

3710 — hit rate 0/14 (0%); blocking steps 4, 7; recoverable missed step-frames: 14.
3023 — hit rate 2/14 (14%); blocking steps 1, 5; recoverable missed step-frames: 12.
35480 — hit rate 0/14 (0%); blocking steps 2; recoverable missed step-frames: 14.
36841 — hit rate 2/14 (14%); blocking steps 2; recoverable missed step-frames: 12.
20482 — hit rate 0/7 (0%); blocking steps 3; recoverable missed step-frames: 7.
3024 — hit rate 0/7 (0%); blocking steps 8; recoverable missed step-frames: 7.
99780 — hit rate 0/7 (0%); blocking steps 9; recoverable missed step-frames: 7.
28802 — hit rate 1/7 (14%); blocking steps 8; recoverable missed step-frames: 6.

Read: rank is by how many steps a class blocks first, then by missed frame count. That makes 3710 and 3023 the highest-leverage fixes because they hurt multiple steps.

Raw false positives (unconstrained detector behavior)

Design ID	Count	Share of top-10 FP mass
18986	126	31.6%
3022	93	23.3%
6806	47	11.8%
30029	34	8.5%
36840	26	6.5%
3005	18	4.5%
28974	17	4.3%
4032a	13	3.3%
20482	13	3.3%
4006	12	3.0%

Top-10 false positives account for 399 raw detections. The biggest offenders are driving review noise and should be suppressed or rebalanced during retraining.

Bottom line

Actionable next moves

Fix class 3710 first — 0/14 hits across steps 4 and 7, the broadest blocker.
Then fix class 3023 — only 2/14 hits across steps 1 and 5.
Clean up step 2 policy — original + corrected clips are both included, inflating exposure to 14 frames for classes 35480/36841.
Reduce FP classes 18986 and 3022 — together they contribute 219 raw false detections.
Re-run after targeted retraining on the weakest expected classes: 3710, 35480, 20482, 3024, 99780.