Deep Research Report

Query: Comprehensive survey of assembly task research in AI/ML from 2022-2026. Cover: (1) Assembly datasets - furniture assembly, mechanical assembly, industrial assembly, instruction-following datasets, video datasets showing assembly processes. (2) Video models for assembly - action recognition, step detection, procedure learning, video transformers applied to assembly. (3) Foundation models for assembly - LLMs/VLMs for assembly planning, multimodal models that understand assembly instructions, robot learning from assembly demonstrations. (4) Arxiv papers on assembly sequence planning, assembly state estimation, part detection for assembly. Include dataset names, sizes, download links, benchmark results, and code repositories. Model: o4-mini-deep-research Date: 2026-03-25 Searches performed: 68 Sources cited: 30

---

Assembly Datasets

Video Models for Assembly

Foundation Models for Assembly

Sequence Planning, State Estimation, and Part Detection (ArXiv)

Resources: All papers above are on ArXiv (links in citations). Whenever available, dataset download pages or GitHub repos are cited above. For example, ATTACH and MECCANO datasets have public webpages (arxiv.org) (arxiv.org); LEGO-ECA data and code are on the SCANet site (arxiv.org); error-detection code from Lehman et al. is linked in their paper (arxiv.org). Each entry above includes reference citations for the results or dataset statistics.

---

Sources

1. Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural Activities 2. AssemblyHands: Towards Egocentric Activity Understanding via 3D Hand Pose Estimation 3. ATTACH Dataset: Annotated Two-Handed Assembly Actions for Human Action Understanding 4. MECCANO: A Multimodal Egocentric Dataset for Humans Behavior Understanding in the Industrial-like Domain 5. FurnitureBench: Reproducible Real-World Benchmark for Long-Horizon Complex Manipulation 6. AssembleRL: Learning to Assemble Furniture from Their Point Clouds 7. REASSEMBLE: A Multimodal Dataset for Contact-rich Robotic Assembly and Disassembly 8. Two by Two: Learning Multi-Task Pairwise Objects Assembly for Generalizable Robot Manipulation 9. IKEA-Manual: Seeing Shape Assembly Step by Step 10. IKEA Manuals at Work: 4D Grounding of Assembly Instructions on Internet Videos 11. SCANet: Correcting LEGO Assembly Errors with Self-Correct Assembly Network 12. SCANet: Correcting LEGO Assembly Errors with Self-Correct Assembly Network 13. 2021-06-24 | Video Swin Transformer 14. Video Action Transformer Network 15. How Object Information Improves Skeleton-based Human Action Recognition in Assembly Tasks 16. Hand Guided High Resolution Feature Enhancement for Fine-Grained Atomic Action Segmentation within Complex Human Assemblies 17. Manual2Skill: Learning to Read Manuals and Acquire Robotic Skills for Furniture Assembly Using Vision-Language Models 18. Manual2Skill++: Connector-Aware General Robotic Assembly from Instruction Manuals via Vision-Language Models 19. Robots receive major intelligence boost thanks to Google DeepMind's 'thinking AI' - a pair of models that help machines understand the world 20. Microsoft unveils first robotics model targeted at boosting physical AI in a bid to free robots from the production line 21. Planning Assembly Sequence with Graph Transformer 22. ASAP: Automated Sequence Planning for Complex Robotic Assembly with Physical Feasibility 23. Physics-Aware Combinatorial Assembly Sequence Planning using Data-free Action Masking 24. Subassembly to Full Assembly: Effective Assembly Sequence Planning through Graph-based Reinforcement Learning 25. Rearrangement Planning for General Part Assembly 26. Efficient and Feasible Robotic Assembly Sequence Planning via Graph Representation Learning 27. Supervised Representation Learning towards Generalizable Assembly State Recognition 28. ASDF: Assembly State Detection Utilizing Late Fusion by Integrating 6D Pose Estimation 29. Find the Assembly Mistakes: Error Segmentation for Industrial Applications 30. IKEA-Manual: Seeing Shape Assembly Step by Step