His group decided to find out. They built a new and diverse version of AlphaZero. It includes multiple AI systems that are individually trained in different situations. Zahavi said the algorithms governing the entire system are designed to act as a kind of virtual matchmaker, identifying which agents are most likely to succeed when it's time to take action. He and his colleagues also coded a “diversity bonus,” a reward earned each time the system draws a strategy from a wide range of choices.
The team observed a lot of diversity when the new system was released to play its own game. The diverse AI players experimented with new and effective openings and novel and sound decisions regarding specific strategies, such as when and where to build a castle. Defeated his original AlphaZero in most matches. The research team also found that the diversified version was able to solve twice as many challenging puzzles as the original, allowing Penrose to solve more than half of his entire catalog of puzzles.
“The idea here is not to find one solution or one policy that can win for any player; [it uses] This idea of creative diversity,” Curry said.
With access to a larger variety of played games, the diversified AlphaZero now has more options in case a sticky situation arises, Zahavi said. “If you can control what kinds of games are shown, you can basically control how it becomes popular,” he said. These strange intrinsic rewards (and their associated movements) can be strengths for diverse behaviors. The system can then learn to evaluate and evaluate different approaches to see when they are most successful. “We found that this group of agents could actually come to agreement on these positions.”
And importantly, its influence extends beyond chess.
real creativity
Curry said the diverse approach could be useful for any AI system, not just those based on reinforcement learning. He has long used diversity to train physical systems, including with his six-legged robots, which were allowed to explore different types of movement before being intentionally “hurt.” This allowed him to use some of the techniques he had previously developed to keep moving. “We were trying to find a solution that was different from all the solutions we had found before.” More recently, he has been working with researchers to leverage diversity to identify promising new drug candidates and improve their effectiveness. We also work on developing effective stock trading strategies.
“The goal is to generate a large collection of potentially thousands of different solutions, where every solution is significantly different from the next,” Cully said. Thus, for any kind of problem, the entire system is able to choose the best possible solution, just as diverse chess players have learned to do so. Zahavi said his AI system clearly shows that “exploring diverse strategies can help you think outside the box and find solutions.”
Zahavi believes that for AI systems to think creatively, researchers simply need to force them to consider more options. This hypothesis suggests a strange relationship between humans and machines. Perhaps intelligence is simply a matter of computational ability. For AI systems, creativity is probably all about the ability to consider and choose from a sufficiently large set of options. This type of creative problem solving is enhanced and strengthened as the system is rewarded for choosing different optimal strategies. Ultimately, it could theoretically be possible to emulate all kinds of problem-solving strategies that are recognized as creative in humans. Creativity will become a calculation problem.
Liemhetcharat noted that diversified AI systems are unlikely to fully solve broader generalization problems in machine learning. But it's a step in the right direction. “This alleviates her one of the drawbacks,” she said.
More specifically, Zahavi's results resonate with recent work showing how cooperation between humans can lead to improved performance on difficult tasks. For example, most of the hit songs on the Billboard 100 list are written by teams of songwriters rather than individuals. And there is still room for improvement. Diverse approaches are currently computationally more expensive because they require consideration of more possibilities than general systems. Zahavi is also not convinced that even his diversified AlphaZero captures the full range of possibilities.
“I still [think] “There is scope to find different solutions,” he said. “Considering all the data in the world, [only] There is only one answer to every question. ”
original story Reprinted with permission from Quanta Magazine, Editorially independent publication simmons foundation Its mission is to enhance the public's understanding of science by covering research developments and trends in mathematics, physical sciences, and life sciences.