Today's most popular virtual attendants (Siri, Alexa, Google Assistant) are far less impressive than modern AI-powered chatbots like ChatGPT and Google Bard. If the fruits of the recent generative AI boom are properly integrated into these legacy assistants' bots, it will definitely become even more interesting.
To preview what happens next, I took a test run of an experimental AI voice helper called vimGPT. When I asked it to “subscribe to WIRED,” it worked with admirable skill, finding the correct web page and accessing the online form. If they had access to my credit card details, they would definitely have succeeded.
Although it's not a test of human intelligence, buying something online on the open web is far more complex and difficult than the tasks typically handled by Siri, Alexa, or Google Assistant. (It's 2010 to set reminders and get sports results.) You need to understand the request, navigate the web to find the right site, and navigate the relevant pages and forms correctly. there is. My helper correctly navigated to the WIRED subscription page and found the form there. Perhaps you're impressed by the fact that you can receive all of WIRED's entertaining and insightful journalism for just $1 a month. But he failed at the final hurdle because he didn't have a credit card. VimGPT utilizes Google's open source browser Chromium, which does not store user information. But in my other experiments, I've found that agents are very skilled at finding funny cat videos and finding cheap airline tickets.
VimGPT is an experimental open source program built by sole developer Ishan Shah and is not a product in development, but it is a similar program used by Apple, Google, and others to upgrade Siri and other assistants. There is no doubt that they are experimenting. VimGPT is built on his GPT-4V, a multimodal version of OpenAI's famous language model. By analyzing requests, we can better determine what you should click or type than text-only software that tries to untangle complex HTML to make sense of the web. “A year from now, I think his experience of using a computer will be very different,” Shah says. He says he built his vimGPT in just a few days. “Most apps will see fewer clicks, more chats, and agents will become an integral part of web browsing.”
Shah is not alone in believing that the next logical step for chatbots like ChatGPT is agents using computers to roam the web. Carnegie Mellon University professor Ruslan Sarakudinov, who served as Apple's head of AI research from 2016 to 2020, believes Siri and other assistants are poised for a universal AI upgrade. “The next evolution will be agents that can perform useful tasks,” says Saraftdinov. It would be useful to connect Siri to his AI, he says, to power ChatGPT. “But when you ask Siri to do something, you have much more impact, and Siri solves the problem for you.”
Salakhutdinov and his students have developed several simulated environments designed to test and hone the skills of their AI helpers in getting things done. These include his e-commerce website for dummies, his version of a mockup of a Reddit-like bulletin board, and his website for classified ads. This virtual testing ground for putting agents through their paces is called VisualWebArena.