ZeroShop: Automated Metric Mesh Generation for Zero-Shot 6D Object Pose Estimation
Abstract
Robotic manipulation of unseen objects relies on zero-shot 6D pose estimation, which typically requires a 3D mesh as a reference. While constructing accurate meshes requires specialized scanning hardware and manual editing, recently proposed Novel View Synthesis (NVS) techniques, such as 2D Gaussian Splatting (2DGS) and Sparse Voxels Rasterization (SVRaster), produce accurate surface reconstructions as a byproduct, potentially eliminating the need for specialized equipment. This work presents an automated image-based mesh generation pipeline that integrates object segmentation, camera registration, point cloud generation, metric height estimation, and NVS mesh generation, eliminating the need for expensive hardware and human intervention. Leveraging 2DGS and SVRaster with MASt3R-SfM or Visual Geometry Grounded Transformer (VGGT), the pipeline produces accurate meshes in minutes, with the VGGT/SVRaster combination reducing reconstruction time to seconds. Grounding near-view object-centric images with far-view scanning scene images using MASt3R yields consistent object height estimates. On the BOP YCB-V benchmark, meshes generated with our pipeline achieve competitive performance with state-of-the-art zero-shot pose estimation methods. Real-life robotic grasping experiments further indicate robust performance even under moderate scale errors. The source code is available at https://github.com/St333fan/meshgen-zeroshop.
How to Cite:
Lechner, S., Ausserlechner, P. & Vincze, M., (2026) “ZeroShop: Automated Metric Mesh Generation for Zero-Shot 6D Object Pose Estimation”, Proceedings of the Austrian Symposium on AI, Robotics, and Vision 3(1), 162-175.
Downloads:
Download PDF
3 Views
1 Downloads
