MetaFold: Language-Guided Multi-Category Garment Folding Framework via Trajectory Generation and Foundation Model

Abstract

Garment folding is a common yet challenging task in robotic manipulation. The deformability of garments leads to a vast state space and complex dynamics, which complicates precise and fine-grained manipulation. In this paper, we present MetaFold, a unified framework that disentangles task planning from action prediction and learns each independently to enhance model generalization. It employs language-guided point cloud trajectory generation for task planning and a lowlevel foundation model for action prediction. This structure facilitates multi-category learning, enabling the model to adapt flexibly to various user instructions and folding tasks. We also construct a large-scale MetaFold dataset comprising folding point cloud trajectories for a total of 1210 garments across multiple categories, each paired with corresponding language annotations. Extensive experiments demonstrate the superiority of our proposed framework. Supplementary materials are available on our website:https://meta-fold.github.io/.

Publication
In IEEE/RSJ International Conference on Intelligent Robots and Systems 2025
Click the Cite button above to demo the feature to enable visitors to import publication metadata into their reference management software.
Create your slides in Markdown - click the Slides button to check out the example.
Jiaqi Huang
Jiaqi Huang
Senior CS Student at Nanjing University
Research Intern at University of Illinois at Urbana-Champaign

My research interests include system reliability & security, and AI for systems.