Validation Prediction Review

Scoring method: Each image is scored by its F1 score at IoU ≥ 0.5: F1 = 2·TP / (2·TP + FP + FN). A predicted box is a True Positive (TP) if it overlaps a ground-truth box of the same class with IoU ≥ 0.5 (greedy highest-IoU matching, at most one GT per prediction). Unmatched predictions are False Positives (FP); unmatched GT boxes are False Negatives (FN). Images with no GT boxes and no predictions receive F1 = 1.0 (vacuously perfect); images with predictions but no GT receive F1 = 0.0.

Best examples have F1 closest to 1 — the model correctly locates and classifies nearly all parts. Worst examples have F1 closest to 0 — many parts are missed or mislabelled.

Best #1 scene_047_view_04.jpg F1 0.750

Original

Ground Truth ■ detected ■ missed

Prediction ■ TP ■ FP

GT: 5 Pred: 3 TP: 3 FP: 0 FN: 2

Best #2 scene_047_view_03.jpg F1 0.667

Original

Ground Truth ■ detected ■ missed

Prediction ■ TP ■ FP

GT: 5 Pred: 4 TP: 3 FP: 1 FN: 2

Best #3 scene_049_view_08.jpg F1 0.625

Original

Ground Truth ■ detected ■ missed

Prediction ■ TP ■ FP

GT: 8 Pred: 8 TP: 5 FP: 3 FN: 3

Best #4 scene_047_view_05.jpg F1 0.600

Original

Ground Truth ■ detected ■ missed

Prediction ■ TP ■ FP

GT: 6 Pred: 4 TP: 3 FP: 1 FN: 3

Best #5 scene_047_view_02.jpg F1 0.600

Original

Ground Truth ■ detected ■ missed

Prediction ■ TP ■ FP

GT: 6 Pred: 4 TP: 3 FP: 1 FN: 3

Best #6 scene_047_view_01.jpg F1 0.600

Original

Ground Truth ■ detected ■ missed

Prediction ■ TP ■ FP

GT: 5 Pred: 5 TP: 3 FP: 2 FN: 2

Best #7 scene_048_view_09.jpg F1 0.571

Original

Ground Truth ■ detected ■ missed

Prediction ■ TP ■ FP

GT: 11 Pred: 10 TP: 6 FP: 4 FN: 5

Best #8 scene_042_view_07.jpg F1 0.538

Original

Ground Truth ■ detected ■ missed

Prediction ■ TP ■ FP

GT: 16 Pred: 10 TP: 7 FP: 3 FN: 9

Best #9 scene_049_view_04.jpg F1 0.533

Original

Ground Truth ■ detected ■ missed

Prediction ■ TP ■ FP

GT: 8 Pred: 7 TP: 4 FP: 3 FN: 4

Best #10 scene_048_view_04.jpg F1 0.500

Original

Ground Truth ■ detected ■ missed

Prediction ■ TP ■ FP

GT: 10 Pred: 6 TP: 4 FP: 2 FN: 6

Worst #1 scene_045_view_01.jpg F1 0.000

Original

Ground Truth ■ detected ■ missed

Prediction ■ TP ■ FP

GT: 9 Pred: 5 TP: 0 FP: 5 FN: 9

Worst #2 scene_045_view_02.jpg F1 0.000

Original

Ground Truth ■ detected ■ missed

Prediction ■ TP ■ FP

GT: 9 Pred: 7 TP: 0 FP: 7 FN: 9

Worst #3 scene_045_view_03.jpg F1 0.000

Original

Ground Truth ■ detected ■ missed

Prediction ■ TP ■ FP

GT: 8 Pred: 5 TP: 0 FP: 5 FN: 8

Worst #4 scene_045_view_05.jpg F1 0.000

Original

Ground Truth ■ detected ■ missed

Prediction ■ TP ■ FP

GT: 9 Pred: 5 TP: 0 FP: 5 FN: 9

Worst #5 scene_045_view_06.jpg F1 0.000

Original

Ground Truth ■ detected ■ missed

Prediction ■ TP ■ FP

GT: 9 Pred: 4 TP: 0 FP: 4 FN: 9

Worst #6 scene_045_view_09.jpg F1 0.000

Original

Ground Truth ■ detected ■ missed

Prediction ■ TP ■ FP

GT: 9 Pred: 4 TP: 0 FP: 4 FN: 9

Worst #7 scene_045_view_04.jpg F1 0.095

Original

Ground Truth ■ detected ■ missed

Prediction ■ TP ■ FP

GT: 9 Pred: 12 TP: 1 FP: 11 FN: 8

Worst #8 scene_044_view_01.jpg F1 0.118

Original

Ground Truth ■ detected ■ missed

Prediction ■ TP ■ FP

GT: 13 Pred: 4 TP: 1 FP: 3 FN: 12

Worst #9 scene_045_view_08.jpg F1 0.118

Original

Ground Truth ■ detected ■ missed

Prediction ■ TP ■ FP

GT: 9 Pred: 8 TP: 1 FP: 7 FN: 8

Worst #10 scene_045_view_07.jpg F1 0.133

Original

Ground Truth ■ detected ■ missed

Prediction ■ TP ■ FP

GT: 9 Pred: 6 TP: 1 FP: 5 FN: 8

Aggregate Metrics — Validation Set

Best 10 Examples

Worst 10 Examples