OpenAI's new video generation tool can learn a lot from babies

“FFirst it was text, then images, and now OpenAI has a model that generates video.” mashable The other day. The makers of ChatGPT and Dall-E just announced Sora, a text-to-video diffusion model. What will no doubt become known as T2V has garnered excited comments across the web, covering the usual spectrum. [insert threatened activity here]? ” “Well” and everything in between.

Sora (the name means “sky” in Japanese) is not the first T2V tool, but it looks more sophisticated than previous efforts such as Meta's Make-a-Video AI. Turn short text descriptions into detailed, high-resolution film clips up to 1 minute long. For example, the prompt “A cat wakes up its sleeping owner and demands breakfast. The owner tries to ignore the cat, but the cat tries a new strategy, and finally the owner sneaks out a secret from under the pillow to keep the cat a little longer.” ” on every social network.

cute? Well, to a certain extent. OpenAI seems to be uncharacteristically candid about the limitations of its tools. For example, it may “struggle to accurately simulate the physics of complex scenes.”

That's to say the least. One of the videos in the sample set illustrates the difficulty of the model. The prompt for making this film was “a photorealistic close-up video of her two pirate ships sailing amidst a cup of coffee.” It's impressive at first glance. But then he notices that one of his ships is moving fast in an inexplicable way, and while Sora may know a lot about the reflection of light in fluids, he doesn't know much about the physical laws that govern the movement of galleons. It becomes clear that you know little or nothing about it. .

Other limitations: Sora can be a little vague about cause and effect. “A person may bite into a cookie, but there may not be a bite mark left on the cookie afterwards.” Tut, tut. It's also possible that “the spatial details of the prompt can be confusing, for example, confusing left and right.” and so on.

Still, it's a start, and it's sure to get even better with another billion teraflops of computing power. And while Hollywood studio heads can continue to sleep peacefully in their king-sized beds, Sora will soon will perform well enough to replace some types of stock video.

But despite making concessions about the tool's limitations, OpenAI says Sora “serves as the foundation for models that can understand and simulate the real world.” The company said this would be a “significant milestone” in the realization of artificial general intelligence (AGI).

Here's where things get interesting. OpenAI's corporate goal is to achieve the holy grail of AGI, and the company seems to believe that generative AI is a concrete step toward that goal. The problem is that achieving AGI requires building machines that understand the real world at least as well as our own. Among other things, it requires an understanding of the physics of moving objects. So the implicit bet of the OpenAI project is that, given enough computing power, a machine that can predict how pixels will move on a screen will one day be able to predict how the physical objects it depicts will behave in the real world. It seems like you'll learn how it works. In other words, this is a bet that extrapolating machine learning paradigms will eventually lead to superintelligent machines.

But an AI that can navigate the real world needs to understand more than just how the laws of physics work in that world. They will also need to understand how humans behave in it. And to anyone who follows Alison Gopnik's research, that would seem a bit far-fetched for the kind of machines that the world currently considers “AI.”

Gopnik is famous for his research into how children learn. Watching her Ted talk, “What Are Babies Thinking?” will be a useful experience for engineers who imagine technology to be the answer to intelligence problems. Decades of research examining the sophisticated information gathering and decision-making that babies make when they play led her to conclude that “babies and toddlers are like humanity's research and development arm.” This columnist, who spent a year observing our granddaughter's first year of development and especially how she was beginning to understand cause and effect relationships, is inclined to agree. If Sam Altman and her OpenAI staff are really interested in her AGI, maybe they should spend some time with the baby.

what i was reading

algorithmic politics
Henry Farrell wrote a seminal essay on the political economy of AI.

bot habits
There is reflective material inside atlantic ocean How chatbots are changing the way we talk by Albert Fox Kahn and Bruce Schneier.

no call
Science fiction author Charlie Stross wrote a blog post about why Britain couldn't introduce conscription even if it wanted to.

Source link

What's Hot

OpenAI's new video generation tool can learn a lot from babies | John Norton

what i was reading

Related Posts

Leave A Reply Cancel Reply

Subscribe to Updates