AlphaGo paved a path for DeepMind to its latest artificial intelligence AlphaZero. Unlike its predecessors, this AI is trained from scratch only through self play, allowing it full self discovery and no bias.
Using human input such as replays of games, or some hard coded behavior has many benefits to training artificial intelligence. For one, it is simply easier in most cases to work off of an existing baseline. Also, some behaviors are just extremely tricky for artificial intelligence to learn on their own. But it comes at a high cost. Any input of “human” information is going to bias the AI, which in many cases can eventually cap it from learning it own imaginative behaviors.
For example, if you trained an AI algorithm to create cars, but gave it millions of car example to learn from, it could only imagine something pretty close to what we know as a car. However, if you only told it to create something that can get 5 humans safely on a road from point A to point B, its imagination and creativity space would vastly increase.
AlphaZero does just that. It receives only one input, the rules of the games. From there it has to train by trail and error, playing against itself (Or other AIs). This process is known as reinforcement learning. AlphaZero very quickly attained superhuman skills in the games of Chess, Shogi and Go. This is huge, as it marks the first attempt at general AI, which is not built to solve only one problem, but several.
Chess
In order to measure the performance of AlphaZero in chess, it was matched up against the state of the art algorithm Stockfish 8. After just 9 hours of training, under constraints of time and computing power, AlphaZero defeated Stockfish 100% of the time. With unlimited time and computing power to think, AlphaZero achieved a win-rate of 93%. To put things a little in perspective, Stockfish can only play the game of chess, and has been fine tuned and developed for over 10 years!
Shogi
This is a Japanese version of Chess. In a 100 game match up against the previous best known AI, and after just 2 hours of training (WHAT ???), AlphaZero achieved a win-rate of over 90%.
Go
Training against it predecessor AlphaGo Zero, also an AI developed by Deepmind, which used a hybrid model (some input and some self play), the new model achieved an over 50% win-rate after just 34 hours of training.
Be the first to comment on "AlphaZero"