What is the algorithm behind AlphaZero

Artificial intelligence: AlphaZero masters chess, Shogi and Go

With "AlphaZero", the Google subsidiary DeepMind has developed an algorithm that independently learns the strategy games of chess, Shogi and Go. And not only that: The resulting "Artificial Intelligence" (AI) plays the respective game better than the strongest previous programs (chess: Stockfish; Shogi: Elmo, Go: AlphaGo Zero 3-day). It is particularly noteworthy that in principle one and the same procedure is used for these three different strategy games. As with the AI ​​AlphaGo, also created by DeepMind, the basis is a neural network.

In the complex strategy game Go, it was a sensation that AlphaGo first beat the top professional Lee Sedol 4: 1, in a stronger version in 2017 the number one in the world rankings Ke Jie 3: 0 and then in a further improved version called AlphaGo Zero had learned the game from scratch and put his predecessor out of the way.

Learning by "trying it out"

Although AlphaGo Zero manages "without human knowledge" about Go according to DeepMind - that is, it was not trained with games of top human players - it was very much tailored to Go, especially in the architecture of the neural networks. The professional world was therefore skeptical as to whether the process could also be transferred to other strategy games such as chess.

It is: In the research paper "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm" recently published on arXiv, the DeepMind team led by David Silver presented the AlphaZero program, which learned all three games within a short period of time - and that only based on the rules and by playing against yourself.

Massive computing power during training

When the Google team writes that the program learned the games within 24 hours, this has to be seen in relation to the effort involved: the paper casually mentions that 5000 first-generation Tensor Processing Units (TPU) and 64 TPUs of the second generation were used. TPUs are special chips for calculations in neural networks. The first generation mentioned performs 92 tera operations per second; nothing specific is known about the second.

But good: the fully trained neural network then plays the game on a single machine with only four TPUs, which is a reasonably fair comparison with the strongest chess and shogi programs, which had 64 CPU cores available.

Victory in three disciplines

According to DeepMind, the fully trained AlphaZero convincingly beat the strongest previous program in each of the three disciplines under tournament conditions (one minute per move). In chess, this was the open source program Stockfish, winner of the Top Chess Engine Championship (TCEC) 2016, in Shogi Elmo, winner of the 27th World Computer Shogi Championship. In Go, the opponent was AlphaGo Zero from their own company.

With a little help ...

AlphaZero could not do without human help either: It is by no means the case that a robot called AlphaZero points its camera at a game board and finds out everything by itself. In the coding of the game position and the possible moves in input features of neural networks there is quite a bit of human expertise, as well as in the coding of the output.

But it is comparatively little. And so it should fascinate chess connoisseurs that AlphaZero discovered all of the most popular openings played by humans (see the DeepMind paper) - and discarded some of them in the end. (bo)

Read comments (212) Go to homepage
Ad ad