Microsoft announced yesterday that researchers at Maluuba, the Montreal-based deep learning startup acquired by Microsoft earlier this year, had used artificial intelligence (AI) to achieve the maximum possible score of 999,990 on the 1980s video game Ms. Pac-Man.
The team from Maluuba used a branch of AI called reinforcement learning to play the Atari 2600 version of Ms. Pac-Man perfectly. The divide-and-conquer method could have broad implications for teaching AI agents to do complex tasks that augment human capabilities.
In a blog post published to the Microsoft website yesterday, Doina Precup, an associate professor of computer science at McGill University said this is a significant achievement among AI researchers, who have been using various videogames to test their systems but have found Ms. Pac-Man among the most difficult to crack.
Precup noted that what is truly impressive is not just what was achieved, but how it was achieved. The team succeeded in mastering Ms. Pac-Man by dividing the problem into small pieces, which they then distributed among AI agents.
“This idea of having them work on different pieces to achieve a common goal is very interesting,” Precup said.
She explains this divide-and-conquer method is similar to some theories of how the brain works, and it could have broad implications for teaching AIs to do complex tasks with limited information.
“That would be really, really exciting because it’s another step toward more general intelligence.”
The Maluuba team calls this method ‘Hybrid Reward Architecture’, and it used more than 150 agents, each of whom worked in tandem with the other agents to master Ms. Pac-Man. Individual tasks ranged from successfully finding one specific pellet to staying out of the way of ghosts.
In command of all of these agents was a top agent, who took into account how many agents advocated for going in a certain direction, as well as the intensity with which they wanted to make that move. Based on the cumulative suggestions and the varying weights ascribed them, the top agent then decided where to move Ms. Pac-Man.
Harm Van Seijen, a research manager with Maluuba who is the lead author of a new paper about the achievement, said the best results were achieved when each agent acted very egotistically – for example, focused only on the best way to get to its pellet – while the top agent decided how to use the information from each agent to make the best move for everyone.
There’s this nice interplay between how they have to, on the one hand, cooperate based on the preferences of all the agents, but at the same time each agent cares only about one particular problem. It benefits the whole.
The unpredictability and randomness inherent to the game are especially valuable for researchers working in the field of reinforcement learning, which in AI research is the counterpart to supervised learning, a more commonly used method of artificial intelligence in which systems get better at doing something as they are fed more examples of good behavior.
With reinforcement learning, an agent gets positive or negative responses for each action it tries, and learns through trial and error to maximize the positive responses, or rewards.
An AI-based system that uses supervised learning would learn how to come up with a proper response in a conversation by feeding it examples of good and bad responses. A reinforcement learning system, on the other hand, would be expected to learn appropriate responses from only high-level feedback, such as a person saying she enjoyed the conversation–a much more difficult task.
AI experts believe reinforcement learning could be used to create AI agents that can make more decisions on their own, allowing them to do more complex work and freeing up people for even more high-value work.
Van Seijen said he also could see this kind of divide-and-conquer approach being used to make advances in other promising areas of AI research, such as natural language processing.
“It really enables us to make further progress in solving these really complex problems,” he said.
Read the whole blog post here.