The perfect score of Ms. Pac-Man or the 999,900 maximum points available on Atari 2600 has finally been reached without using a cheat code. But instead of being achieved by a human player, this score is the achievement of AI.
Artificial Intelligence researchers from Microsoft’s Maluuba teamed up to try and defeat the then still unreachable high score. Ms. Pac-Man was intentionally developed to be as unpredictable as possible.
Its lack of predictability has been keeping players from developing a strategy to beat its high score without help from cheat codes.
The currently highest score reached by a human player, the global number one on the Atari 2600 version, is of 266,330. Until now, the game’s 999,900 maximum value point value remained elusive.
Perfect Score Achieved Through a Unique Tech Combination
AI researchers are not at their first use of games in training their software. Reports state that video games are better at mimicking real-world chaos and unexpected situation than more static games. So they are a quite common method of testing machine learning.
For example, back in 2015, the DeepMind AI from Google was capable of mastering 49 Atari games. It did so thanks to reinforcement learning, which offers feedback, positive or negative likewise, as an AI tries to solve a problem.
Now, the Maluuba team took reinforcement learning and coupled it with the divide-and-conquer strategy. The team used Artificial Intelligence (AI) by tasking out responsibilities in the game among 150 agents. Then, the researchers used their Hybrid Reward Architecture or the divide-and-conquer couple with reinforcement learning method.
Each agent received ‘bite-sized’ jobs, for example finding that one pellet. At the same time, these also together with other agents, so as to accomplish higher goals. Information gathered by then was then transmitted to the designated ‘top agent’.
These took all these suggestions into account before moving the Ms. Pac-Man.
“There’s this nice interplay between how they have to, on the one hand, cooperate based on the preferences of all the agents, but at the same time each agent cares only about one particular problem,” stated Harm Van Seijen.
He is the lead author of a paper on this achievement. According to Maluuba, the AI version of Hybrid Reward Architecture could be very useful and practical in multiple domains. For example, it could help advance natural language processing. Or help predict how a company’s sales will fare.
Image Source: Pixabay