Now updated with real runs for the first two recommended experiments: coordinate calibration and crop-based zoom search on the actual drone frame.
| Model | Calibration board | One-shot fastener search | Tiled crop-search | Takeaway |
|---|---|---|---|---|
| gemini-2.5-pro | 5/5 landmarks exact 0 px mean error |
12 candidate fasteners mostly symmetric central structure |
27 merged candidates many more peripheral / local-detail hits |
Good raw grounding + crop search expands coverage, but likely overcalls. |
| gemini-robotics-er-1.5-preview | 5/5 landmarks exact 0 px mean error |
8 candidate fasteners coarser / more selective than 2.5 Pro |
27 merged candidates similar expansion under crop-search |
Also understands pixel frame correctly on calibration; crop-based prompting materially changes behavior. |
Important caveat: these counts are not ground-truth precision/recall numbers yet. They are behavior probes.
Both models returned the exact centers for all five labeled circles on the synthetic calibration image.
This is good news: the weirdness we saw earlier is probably not a generic “Gemini does not know image coordinates” problem. It is more likely prompt/task-specific or scene-specific.
Still worth running next. This would separate “bad localization” from “category confusion among tiny metal parts.”
Still pending. Important if the end goal is stable part identity across multiple views rather than single-image marking.
Still pending. Useful for distinguishing “I see it” from “I infer it’s behind the occluder.”