Multimodal Robot Learning with Vision-Language Models

PyTorch ROS2 SAM LLMs Isaac Gym

A comprehensive system integrating vision-language models, segmentation, and reinforcement learning for advanced robotic manipulation. The system uses natural language commands to control robots while understanding visual scenes and adapting to new environments.

Features

Back View on GitHub