In 1997, IBM shocked the world when its computer program, Deep Blue, beat the world chess champion Gary Kasparov in a six-game match. Two decades later, AlphaZero, a program designed by DeepMind, a subsidiary of Google, accomplished something far more impressive. Not only does the program know how to play strategy games, including chess, better than any human on Earth, but it learned how to play them on its own.
Simple but smart
Most game programs have been the product of thousands of data inputs. For instance, Stockfish, the open source chess engine that is widely considered the best conventional chess engine in the world, is updated constantly.
AlphaZero, however, is stunningly simple. The only knowledge that it was programmed with was the rules of chess, shogi and go.
It’s made up of two parts: a neural network and the Monte Carlo Tree Search algorithm. Initially, when the system is unleashed, it is relatively bad at the game. Like a beginning player, the neural network picks random points on the decision tree and sees what happens.
However, the more games it plays, the better the neural network understands it. And of course, it has the advantage of playing much faster than a human. In its first nine hours of existence, AlphaZero played 44 million games of chess against itself. After two hours, it was better than any human. After four hours, it was beating the best chess program in the world.
In the two decades since losing to DeepBlue, Gary Kasparov had grown used to seeing computer programs attain new heights in gameplay. However, there was something fundamentally different about AlphaZero: it played like a human!
“The conventional wisdom was that machines would approach perfection with endless dry maneuvering, usually leading to drawn games,” Kasparov wrote in Science Magazine. “But in my observation, AlphaZero prioritizes piece activity over material, preferring positions that to my eye looked risky and aggressive. Programs usually reflect priorities and prejudices of programmers, but because AlphaZero programs itself, I would say that its style reflects the truth.”
AlphaZero’s triumph over its conventional counterparts illustrates the distinction between the limited capabilities of powerful computers and the potentially limitless capabilities of artificial intelligence. Unlike conventional computers, which rely on humans to program them with information, AI develops knowledge on its own. For better or worse, it’s person-like ability to learn holds the potential to transform the world.
It’s less of a supercomputer than a superhuman.
After losing to AlphaZero, world go champion Lee Sedol was initially despondent, and issued an apology to fans for being “so powerless.” Eventually, however, he showed appreciation for his adversary: “Maybe it can show humans something we’ve never discovered. Maybe it’s beautiful.”
DeepMind releases AlphaZero preprint
The DeepMind team releases a preprint introducing AlphaZero, which within 24 hours of training achieved a superhuman level of play in chess, shogi and go by defeating world-champion programs Stockfish, elmo, and the 3-day version of AlphaGo Zero.
MuZero takes the stage
DeepMind publishes a new paper detailing MuZero, a new algorithm able to generalise on AlphaZero work, playing both Atari and board games without knowledge of the rules or representations of the game.