High score tetris AI using branching algorithms

This post will explore how branching algorithms can improve the performance of agents playing tetris without gravity.

The techniques used are similar to Nintendo Tetris AI Revisited by Meat Fighter. It may be worth reading his post first, as this post builds upon his work

PSO configuration

We define an evaluate function as a heuristic that gives a score to a tetris board and block type (e.g. I-PIECE). evaluate is calculated by getting the dot product of a feature vector and a weights vector of the same size, i.e. $\text{score} = W_v\cdot F_v$ . PSO was used to find optimal weights for the weights vector of evaluate. The feature vector and the PSO setup used is similar to that of Meat Fighter’s post. Note that the branching algorithm is not incorporated in the PSO training phase, evaluate is used directly.

Notable differences include:

Using global PSO from pyswarms, probably with different parameters
- {'c1': 2.5, 'c2': 2.5, 'w': 0.92}
- num particles: 96
- num iterations: 3600
Removing lock_height from the evaluation function

Our configuration for each particle is as follows:

each particle plays 150 games using the same piece sets
the score for a particle is the average of the top 50% of games

Branching algorithm description

The branching algorithm is inspired by alpha-beta pruning algorithms, except with only one agent. The inputs are a board and the next two block types from the piece set (call these $p_1$ and $p_2$ ). The idea is to search promising branches to find the best placement for $p_1$ .

Bfs algorithm

Define the following methods:

findAllMoves(board, blockType): finds all moves (a.k.a PieceInfo) given a board a blockType
evaluate(board, PieceInfo): uses predefined weights to give a score for the game state

Firstly, if there is a move that results in a tetris with $p_1$ , select that move.

If not, observe that we can apply these functions two times each to find an ideal position for $p_1$ just by using $p_1$ and $p_2$ . This would result in a tree of height $2$ , where we take the branch with the leaf node that has the best evaluation. This is the simplest way of utilising $p_1$ and $p_2$ , and describes what is used in Bfs in the results.

Bfs+branching algorithm

For more predictive power, we aim to maximise the expected evaluation of a partial search tree. To do this, we sum the product of the evaluation if we had block type $t$ , multiplied by the probability $P(t)$ based on the piece RNG algorithm. Adding a depth parameter $d$ to make the results computable, the recursive relation looks like:

\texttt{evalBoard}(b, d) = \sum_{t \in T}{P(t)\cdot\texttt{evalGivenBlockType}(b, t, d)}\\ \text{where }T\text{ is the set of block types}\\ \texttt{evalGivenBlockType}(b, t, d) = \begin{cases} \max\limits_{p\in M(b, t)}{\texttt{evaluate}(b, p)} & \text{if } d = 0 \\ \max\limits_{p\in M(b, t)}\texttt{evalBoard}(\texttt{applyPiece}(b, p), d-1) & \text{otherwise} \end{cases} \\ \text{where }M = \texttt{findAllMoves}

From this, we see that the branching factor for evalGivenBlockType calls is the number of block types ( $7$ ) multiplied by the number of moves from findAllMoves (typically $10-20$ ). To mitigate this, we define a pruning factor $n_1$ , and only continue the search in the top $n_1$ scores, bringing the branching factor down to $7n_1$ (however, evaluate still needs to be called on all of these states to establish which should be pruned).

An implementation detail is that evaluate adds an offset of -1e9 when a tetris is made. Slight modifications are made to ensure that this offset is carried through to the leaf node of the tree, to promote aggressive decisions

With pruning on $p_1$ and $p_2$ included (prune factors of first layer 3, subsequent layers 2), only one extra layer was able to be computed in a reasonable time (132x slower than no branching). Profiling reveals that the ratio of evaluate to findAllMoves is about the same between bfs and bfs+branching ( $73:19$ and $78:16$ respectively), and the number of calls to each function evaluate and findAllMoves is 154x and 146x respectively.

Bfs+branching visualisation

Below is a graph that visualises the branching factor. The numbers on the edges indicate how many children the parent node has. $s_1$ and $s_2$ are special because the block types are known. Note that, for each rectangle node, all the possible moves result in a call of evaluate, for pruning or finding the best value in the leaf node. This explains the number of calls to evaluate, despite the selective search (and therefore explains why it is much slower to compute)

Branching graph

Results

After training is finished, a different set of 150 games were used to see how the bot performs. This was run with two configurations - with and without branching - using an identical state evaluation function evaluate:

name	median	mean	maxout%
Bfs	1066910	1053015	81.33
Bfs+branching	1230460	1193441	94.67

Note that the highest score that can be achieved is 1542000. Raw data: bfs, bfs_branch.csv

We can also compare the probability of a bot achieving a certain score:

From this data we draw the following conclusions:

96% of the time, bfs+branching is superior
On average, the bfs+branching achieves an additional 140426 points (or a 13% improvement)
Notably, bfs+branching is 132x less efficient than bfs (913.5s vs 6.9s to compute)

Possible improvements

There are a few things that can be done to improve the performance of the agent:

Adjusting the parameters
Adding or finding more features
Using a neural network instead of a linear combination of parameters is an idea. Using a neural network can allow for cases where one feature can have more or less weight depending on another feature. For example, the agent should play more safely when the pile height is high or the board is too jagged

There are a few improvements that will be possible with better hardware:

Training with a larger dataset
Training with the branching algorithm instead of evaluate

Ultimately I’m satisfied with the results - a 13% improvement is significant, especially when considering it is approaching the theoretical maximum score. It would be nice if it was less computationally expensive. There is potential for offloading the evaluation function (which is 75% of the processing) to the GPU, which would be a substantial improvement