The Massachusetts Institute of Technology (MIT) unveiled a new method for training robots last week that uses generative artificial intelligence (AI) models. The new technology relies on integrating data across different domains and methods and unifying it into a common language that can then be processed by large language models (LLMs). MIT researchers claim that this method could lead to the emergence of general-purpose robots that can handle a wide range of tasks without having to train each skill individually from scratch.
Researchers from MIT are developing an AI-inspired technology to train robots
In the newsroom mailMIT has detailed the new methodology for training robots. Currently, teaching a specific task to a robot is a challenging proposition as it requires a large amount of simulation and real-world data. This is essential because if a robot does not understand how to perform a task in a particular environment, it will have difficulty adapting to it.
This means that for each new mission, new sets of data are needed that include both simulation and real-world scenarios. The robot then undergoes a training period where procedures are refined and errors and glitches are eliminated. As a result, robots are generally trained for a specific task, and those multi-purpose robots seen in science fiction films are never seen in reality.
However, a new technology developed by researchers at MIT claims to bypass this challenge. In a paper Published in the pre-print online journal arXIv (note: it is not peer-reviewed), the scientists stressed that generative AI could help solve this problem.
For this reason, data across different domains, such as simulations, real robots, and different modalities such as vision sensors and robotic arm position encoders, has been unified into a common language that can be processed by an AI model. A new architecture called heterogeneous pre-trained transformer (HPT) has also been developed to unify the data.
Interestingly, the study’s lead author, Lerui Wang, an Electrical Engineering and Computer Science (EECS) graduate student, said the inspiration for this technology was drawn from artificial intelligence models such as OpenAI’s GPT-4.
The researchers added an LLM model called a transducer (similar to the GPT architecture) in the middle of their system that processes vision and proprioception (the sense of self-motion, force, and position) inputs.
MIT researchers say this new method could be faster and less expensive to train robots than traditional methods. This is largely due to the lower amount of task-specific data required to train the robot on different tasks. Furthermore, the study found that this method outperformed training from scratch by more than 20% in both simulation and real-world experiments.