Companies like OpenAI and Midjourney are building chatbots, image generators, and other artificial intelligence tools that work in the digital world.
Now, a startup founded by three former OpenAI researchers is using the technology development techniques behind chatbots to build AI technology that can navigate the physical world.
Covariant, a robotics company headquartered in Emeryville, Calif., is developing ways for robots to pick up, move and sort items as they travel to and from warehouses and distribution centers. The goal is to help robots understand what's happening around them and decide what to do next.
The technology also allows robots to understand a wide range of English, allowing people to chat with them just as they would chat with ChatGPT.
This technology is still under development and is not perfect. But it's a clear sign that the artificial intelligence systems that power online chatbots and image-generating devices will also power machines in warehouses, roads, and homes.
Like chatbots and image generators, this robotic technology learns skills by analyzing vast amounts of digital data. This means engineers can improve the technology by feeding it more data.
Covariant, which is backed by $222 million in funding, does not manufacture robots. Build the software that moves the robot. The company aims to bring its new technology to warehouse robots and provide a roadmap for other companies to do the same in manufacturing plants and even on the roads with driverless cars. ing.
The AI systems that power chatbots and image generators are called neural networks, named after the web of neurons in the brain.
By identifying patterns in vast amounts of data, these systems can learn to recognize words, sounds, and images, and even generate them on their own. This is how OpenAI built ChatGPT to provide the ability to instantly answer questions, write term reports, and generate computer programs. These skills were learned from carefully selected texts from across the internet. (Several media outlets, including The New York Times, have sued OpenAI for copyright infringement.)
Companies are now building systems that can learn from different types of data simultaneously. For example, by analyzing both her photo collection and the captions that describe those photos, the system can figure out the relationship between the two. You can learn that the word “banana” refers to a curved yellow fruit.
OpenAI used that system to build Sora, a new video generator. By analyzing thousands of captioned videos, the system learns that when given a short description of a scene, such as “a gorgeously rendered papercraft world of coral reefs filled with colorful fish and marine life,” You learned to generate videos.
Covariant, founded by University of California Berkeley professor Peter Abbeer and three of his former students Peter Chen, Rocky Duan, and Tianhao Zhang, is using similar technology to build systems that power warehouse robots. It was used.
The company helps operate sorting robots in warehouses around the world. The company has spent years collecting data from cameras and other sensors that show how these robots operate.
“It captures all kinds of data that are important to the robot. It helps the robot understand and interact with the physical world,” Dr. Chen said.
By combining that data with the vast amounts of text used to train chatbots like ChatGPT, the company has built AI technology that gives robots a broader understanding of the world around them.
After identifying patterns in this stew of images, sensory data, and text, the technology empowers robots to deal with unexpected situations in the physical world. The robot knows how to pick up a banana, even if it has never seen one before.
Like a chatbot, it can also respond to plain English. When you say, “Pick up a banana,” they understand what it means. If you say, “Pick up the yellow fruit,'' it will understand.
You can also generate a video that predicts what will happen when you try to pick up a banana. These videos aren't really useful in a warehouse, but they do show that the robot understands what's around it.
“If we can predict the next frame of a video, we can pinpoint the appropriate strategy to follow,” Abbeer said.
This technology, called RFM, which is a fundamental model in robotics, makes mistakes just like chatbots. We often understand what people want, but there's always a chance that we don't. Sometimes they drop objects.
Gary Marcus, an AI entrepreneur and professor emeritus of psychology and neuroscience at New York University, said the technology could be useful in warehouses and other situations where mistakes are tolerated. But he said it would be more difficult and risky to deploy them in manufacturing plants and other potentially dangerous situations.
“In the end, mistakes cost more,” he says. “If you have a 150-pound robot that can do harmful things, the cost can be high.”
Researchers believe these systems will improve rapidly as companies train them on increasingly larger and more diverse collections of data.
That's very different from how robots operated in the past. Typically, engineers program robots to perform the same exact movements over and over again, such as lifting a box of a certain size or installing a rivet in a specific location on a car's rear bumper. However, robots were not able to deal with unexpected or chance situations.
By learning from digital data (hundreds of thousands of examples of what's happening in the physical world), robots can begin to deal with unexpected situations. And when you combine these examples with language, robots can also respond to text and voice suggestions like chatbots.
This means that robots, like chatbots and image generators, will also become more agile.
“What's in digital data can be transferred to the real world,” Dr. Chen said.