High score tetris AI using branching algorithms
This post will explore how branching algorithms can improve the performance of agents playing tetris without gravity.
The techniques used are similar to Nintendo Tetris AI Revisited by Meat Fighter. It may be worth reading his post first, as this post builds upon his work
PSO configuration
We define an evaluate
function as a heuristic that gives a score to a tetris board and block type (e.g. I-PIECE). evaluate
is calculated by getting the dot product of a feature vector and a weights vector of the same size, i.e. . PSO was used to find optimal weights for the weights vector of evaluate
. The feature vector and the PSO setup used is similar to that of Meat Fighter’s post. Note that the branching algorithm is not incorporated in the PSO training phase, evaluate
is used directly.
Notable differences include:
- Using global PSO from pyswarms, probably with different parameters
{'c1': 2.5, 'c2': 2.5, 'w': 0.92}
- num particles: 96
- num iterations: 3600
- Removing
lock_height
from the evaluation function
Our configuration for each particle is as follows:
- each particle plays 150 games using the same piece sets
- the score for a particle is the average of the top 50% of games
Branching algorithm description
The branching algorithm is inspired by alpha-beta pruning algorithms, except with only one agent. The inputs are a board and the next two block types from the piece set (call these and ). The idea is to search promising branches to find the best placement for .
Bfs algorithm
Define the following methods:
findAllMoves(board, blockType)
: finds all moves (a.k.aPieceInfo
) given a board a blockTypeevaluate(board, PieceInfo)
: uses predefined weights to give a score for the game state
Firstly, if there is a move that results in a tetris with , select that move.
If not, observe that we can apply these functions two times each to find an ideal position for just by using and . This would result in a tree of height , where we take the branch with the leaf node that has the best evaluation. This is the simplest way of utilising and , and describes what is used in Bfs
in the results.
Bfs+branching algorithm
For more predictive power, we aim to maximise the expected evaluation of a partial search tree. To do this, we sum the product of the evaluation if we had block type , multiplied by the probability based on the piece RNG algorithm. Adding a depth parameter to make the results computable, the recursive relation looks like:
From this, we see that the branching factor for evalGivenBlockType
calls is the number of block types () multiplied by the number of moves from findAllMoves
(typically ). To mitigate this, we define a pruning factor , and only continue the search in the top scores, bringing the branching factor down to (however, evaluate
still needs to be called on all of these states to establish which should be pruned).
An implementation detail is that evaluate
adds an offset of -1e9
when a tetris is made. Slight modifications are made to ensure that this offset is carried through to the leaf node of the tree, to promote aggressive decisions
With pruning on and included (prune factors of first layer 3, subsequent layers 2), only one extra layer was able to be computed in a reasonable time (132x slower than no branching). Profiling reveals that the ratio of evaluate
to findAllMoves
is about the same between bfs
and bfs+branching
( and respectively), and the number of calls to each function evaluate
and findAllMoves
is 154x and 146x respectively.
Bfs+branching visualisation
Below is a graph that visualises the branching factor. The numbers on the edges indicate how many children the parent node has. and are special because the block types are known. Note that, for each rectangle node, all the possible moves result in a call of evaluate
, for pruning or finding the best value in the leaf node. This explains the number of calls to evaluate
, despite the selective search (and therefore explains why it is much slower to compute)
Results
After training is finished, a different set of 150 games were used to see how the bot performs. This was run with two configurations - with and without branching - using an identical state evaluation function evaluate
:
name | median | mean | maxout% |
---|---|---|---|
Bfs | 1066910 | 1053015 | 81.33 |
Bfs+branching | 1230460 | 1193441 | 94.67 |
Note that the highest score that can be achieved is 1542000. Raw data: bfs, bfs_branch.csv
We can also compare the probability of a bot achieving a certain score:
From this data we draw the following conclusions:
- 96% of the time, bfs+branching is superior
- On average, the bfs+branching achieves an additional 140426 points (or a 13% improvement)
- Notably, bfs+branching is 132x less efficient than bfs (913.5s vs 6.9s to compute)
Possible improvements
There are a few things that can be done to improve the performance of the agent:
- Adjusting the parameters
- Adding or finding more features
- Using a neural network instead of a linear combination of parameters is an idea. Using a neural network can allow for cases where one feature can have more or less weight depending on another feature. For example, the agent should play more safely when the pile height is high or the board is too jagged
There are a few improvements that will be possible with better hardware:
- Training with a larger dataset
- Training with the branching algorithm instead of
evaluate
Ultimately I’m satisfied with the results - a 13% improvement is significant, especially when considering it is approaching the theoretical maximum score. It would be nice if it was less computationally expensive. There is potential for offloading the evaluation function (which is 75% of the processing) to the GPU, which would be a substantial improvement