This post will explore how branching algorithms can improve the performance of agents playing tetris without gravity.

The techniques used are similar to Nintendo Tetris AI Revisited by Meat Fighter. It may be worth reading his post first, as this post builds upon his work

PSO configuration

We define an evaluate function as a heuristic that gives a score to a tetris board and block type (e.g. I-PIECE). evaluate is calculated by getting the dot product of a feature vector and a weights vector of the same size, i.e. score=WvFv\text{score} = W_v\cdot F_v. PSO was used to find optimal weights for the weights vector of evaluate. The feature vector and the PSO setup used is similar to that of Meat Fighter’s post. Note that the branching algorithm is not incorporated in the PSO training phase, evaluate is used directly.

Notable differences include:

  • Using global PSO from pyswarms, probably with different parameters
    • {'c1': 2.5, 'c2': 2.5, 'w': 0.92}
    • num particles: 96
    • num iterations: 3600
  • Removing lock_height from the evaluation function

Our configuration for each particle is as follows:

  • each particle plays 150 games using the same piece sets
  • the score for a particle is the average of the top 50% of games

Branching algorithm description

The branching algorithm is inspired by alpha-beta pruning algorithms, except with only one agent. The inputs are a board and the next two block types from the piece set (call these p1p_1 and p2p_2). The idea is to search promising branches to find the best placement for p1p_1.

Bfs algorithm

Define the following methods:

  • findAllMoves(board, blockType): finds all moves (a.k.a PieceInfo) given a board a blockType
  • evaluate(board, PieceInfo): uses predefined weights to give a score for the game state

Firstly, if there is a move that results in a tetris with p1p_1, select that move.

If not, observe that we can apply these functions two times each to find an ideal position for p1p_1 just by using p1p_1 and p2p_2. This would result in a tree of height 22, where we take the branch with the leaf node that has the best evaluation. This is the simplest way of utilising p1p_1 and p2p_2, and describes what is used in Bfs in the results.

Bfs+branching algorithm

For more predictive power, we aim to maximise the expected evaluation of a partial search tree. To do this, we sum the product of the evaluation if we had block type tt, multiplied by the probability P(t)P(t) based on the piece RNG algorithm. Adding a depth parameter dd to make the results computable, the recursive relation looks like:

evalBoard(b,d)=tTP(t)evalGivenBlockType(b,t,d)where T is the set of block typesevalGivenBlockType(b,t,d)={maxpM(b,t)evaluate(b,p)if d=0maxpM(b,t)evalBoard(applyPiece(b,p),d1)otherwisewhere M=findAllMoves\texttt{evalBoard}(b, d) = \sum_{t \in T}{P(t)\cdot\texttt{evalGivenBlockType}(b, t, d)}\\ \text{where }T\text{ is the set of block types}\\ \texttt{evalGivenBlockType}(b, t, d) = \begin{cases} \max\limits_{p\in M(b, t)}{\texttt{evaluate}(b, p)} & \text{if } d = 0 \\ \max\limits_{p\in M(b, t)}\texttt{evalBoard}(\texttt{applyPiece}(b, p), d-1) & \text{otherwise} \end{cases} \\ \text{where }M = \texttt{findAllMoves}

From this, we see that the branching factor for evalGivenBlockType calls is the number of block types (77) multiplied by the number of moves from findAllMoves (typically 102010-20). To mitigate this, we define a pruning factor n1n_1, and only continue the search in the top n1n_1 scores, bringing the branching factor down to 7n17n_1 (however, evaluate still needs to be called on all of these states to establish which should be pruned).

An implementation detail is that evaluate adds an offset of -1e9 when a tetris is made. Slight modifications are made to ensure that this offset is carried through to the leaf node of the tree, to promote aggressive decisions

With pruning on p1p_1 and p2p_2 included (prune factors of first layer 3, subsequent layers 2), only one extra layer was able to be computed in a reasonable time (132x slower than no branching). Profiling reveals that the ratio of evaluate to findAllMoves is about the same between bfs and bfs+branching (73:1973:19 and 78:1678:16 respectively), and the number of calls to each function evaluate and findAllMoves is 154x and 146x respectively.

Bfs+branching visualisation

Below is a graph that visualises the branching factor. The numbers on the edges indicate how many children the parent node has. s1s_1 and s2s_2 are special because the block types are known. Note that, for each rectangle node, all the possible moves result in a call of evaluate, for pruning or finding the best value in the leaf node. This explains the number of calls to evaluate, despite the selective search (and therefore explains why it is much slower to compute)

Branching graph

Results

After training is finished, a different set of 150 games were used to see how the bot performs. This was run with two configurations - with and without branching - using an identical state evaluation function evaluate:

name median mean maxout%
Bfs 1066910 1053015 81.33
Bfs+branching 1230460 1193441 94.67

Note that the highest score that can be achieved is 1542000. Raw data: bfs, bfs_branch.csv

We can also compare the probability of a bot achieving a certain score:

From this data we draw the following conclusions:

  • 96% of the time, bfs+branching is superior
  • On average, the bfs+branching achieves an additional 140426 points (or a 13% improvement)
  • Notably, bfs+branching is 132x less efficient than bfs (913.5s vs 6.9s to compute)

Possible improvements

There are a few things that can be done to improve the performance of the agent:

  • Adjusting the parameters
  • Adding or finding more features
  • Using a neural network instead of a linear combination of parameters is an idea. Using a neural network can allow for cases where one feature can have more or less weight depending on another feature. For example, the agent should play more safely when the pile height is high or the board is too jagged

There are a few improvements that will be possible with better hardware:

  • Training with a larger dataset
  • Training with the branching algorithm instead of evaluate

Ultimately I’m satisfied with the results - a 13% improvement is significant, especially when considering it is approaching the theoretical maximum score. It would be nice if it was less computationally expensive. There is potential for offloading the evaluation function (which is 75% of the processing) to the GPU, which would be a substantial improvement