Introduction

See papers:

The above paper and technical document talks about a technique that can be used to reduce the number of nodes traversed by the IDA* search when finding an optimal solution for the n-puzzle.

IDA* search does not use any memory to store which nodes have already been visited, and so it will revisit the same nodes many times in the DFS search. The technique described in the paper is to create a state machine to govern successor nodes for a given path.

The technique is to find a list of forbidden operator strings, and then use the Aho-Corasick algorithm to build a finite state machine that can be used to skip operator strings that are redundant in the search. For example, exploring up to depth 2 produces operator strings lr, rl, ud, du, which builds an FSM with 9 states.

Results

The branching factor was calculated by finding the number of paths of lengths 23, 24, 25. The geometric mean was taken of the branching factors of odd and even parity, e.g N25N24N24N23\sqrt{\frac{N_{25}}{N_{24}} \cdot \frac{N_{24}}{N_{23}}}.

The result for duplicate operators matches what has already been shown in this paper:

Branching factor, calculated by:

./bin/puzzle -d databases/6666-reg --fsmDepthLimit 2 -e

The set of 16366272 duplicate operator strings for depth 22 can be downloaded here:

Type Number Words, Number FSM States Branching factor
idfs2 4 strings, 5 states 2.36762 = sqrt(2.39446*2.34108)
idfs14 18414 strings, 65558 states 2.24915 = sqrt(2.27542*2.22317)
idfs22 16366272 strings, 61701626 states 2.16475 = sqrt(2.18836*2.1414)

Experimental results

Using the first two problems from Solving the 24-Puzzle.

The single-threaded solver uses plain IDA*, not a hybrid search. See log file: duplicate-nodes-log.txt.

Number of nodes

problem# idfs2 idfs14 idfs22
1 58,644,915,709 23,045,422,923 (39.29%) 14,531,111,159 (24.78%)
2 21,770,762,064 7,842,119,256 (36.02%) 5,089,706,902 (23.38%)

Runtime

problem# idfs2 idfs14 idfs22
1 7,029.19s 2,977.23s (42.3%) 2,527.94s (36.0%)
2 2,447.82s 956.079s (39.0%) 870.295s (35.6%)

Nodes per second

fsm type nodes per second
idfs2 8,485,342.7 nodes/s (80,415,677,773/9,477.01)
idfs14 7,852,813.5 nodes/s (30,887,542,179/3,933.309)
idfs22 5,773,826.1 nodes/s (19,620,818,061/3,398.235)

These results show that as the state machine grows in number of states, the nodes per second decreases, probably due to cache locality. However, the decrease in the branching factor drastically decreases the number of the nodes that need to be expanded in the search.

In these two problems, idfs22 outperformed idfs14, however there will be easier problems where idfs14 will perform better.

Algorithm to find the duplicate operator strings

The technique suggested in the paper uses a BFS to explore a larger version of the puzzle. For example, the 24-puzzle (5x5) - you would do a BFS on the 9x9 puzzle so that all paths would be found. In the search, paths where the bounding box of the blank tile is larger than 5 horizontally or vertically would be excluded. For example lrrrrr can not occur in the 24-puzzle, but can occur on the 9x9 grid.

By keeping track of the board states that have been visited before, we can exclude operator strings that revisit a known board state. We must ensure that the two board states can occur for every starting position of the blank tile. Taylor describes how to check this “Operator precondition”

To deal with this situation, a routing was written to test the “bounding box” of the actions of a pair of strings, A and B. B can be a duplicate if it is a match of A, and A has a bounding box contained within or identical to that of B.

For example:

When the blank tile begins at position 1 as in the video, then one of these operator strings can be considered a duplicate. However, if the blank tile begins in position 0, only the right operator string is valid, so there is no duplicate operator string in this case.

So to consider either of these as a duplicate operator is wrong, and could result in missing the optimal solution.

How to handle running out of memory

When using BFS and keeping track of seen board states, memory is quickly exhausted.

A key observation - if we have a list of operator strings of length n-1, and we know all shortest paths of length n for a given board state, we can make a function that will correctly choose the same subset of these shortest paths to be considered duplicate operators.

For this set of operator strings, we want to split it into two groups - a set of permitted operators, and a set of forbidden operators. We then must not violate the following constraint:

For each forbidden operator, there must be at least one permitted operator which has a length less than or equal to it, which bounding box is a subrange of the forbidden operator.

To compute this, we look at all 2-partitions of the set of operators for a given board state, and we have a sorting function that always determines the same “best” 2-partition to use.

Since the forbidden operator strings we are computing is a set, it does not matter if we process the same board state more than once! Because of this, an iterative DFS can be used to find all board states and all paths to that board state of a given depth. When searching at depth nn, an FSM with operator strings of length n1n-1 is used to find new operator strings of length nn. When memory is running low, a clean up process can begin, which is a DFS of depth nn which does not track any more board states, it will only find paths to known board states with paths of length nn. This clean-up process will find new duplicate operators, and remove the board states from memory since the duplicate operators of these board states has already been computed.

In this way, we can compute very long duplicate operators, at the cost of significantly more computation. When the clean up process is used, the branching factor is squared. So the branching factor moves from O(2.36n)O(2.36^n) to O(5.57n)O(5.57^n). The memory requirement is reduced and is now bound by the number of paths for a board state of the same length, which is miniscule up to depth 22.

Improving the scoring function

A naive scoring function is to count the number of possible starting positions of the blank tile for a forbidden operator. For example, a 2x2 bounding box in a 5x5 grid will have 16 possible starting positions. When scoring a 2-partition, we can take the sum of the number of possible starting positions of each forbidden operator in the forbidden operator partition.

I could not find any improvement on this scoring function. An idea that was explored was looking at the sum of the probabilities of the starting positions of the blank tile. However this did not produce a different scoring order.

The ordering used was the sum of the tiles, with a fallback to ordering lexicographically if they had the same score. Note that changing the fallback ordering will result in a different set of forbidden operator strings, which suggests that there may be a scoring function that produces better run times.

FSM Structural Improvement

The first four duplicate operator strings are lr, rl, ud, du. The basic Aho-corasick algorithm is represented like:

This is a trie with nodes that have four children and a boolean if it is a word or not. Our implementation does not need a failure function because we populate every edge of the trie - it is not space efficient to use a double-array trie because our character set is small.

So the size of the trie is four ints + a boolean, which is 17 bytes per word:

struct TrieNode
{
    int children[4];
    bool isWord;
};
TrieNode trie[NUM_WORDS];

We can improve on this by noticing that we never visit words. When the IDA* detects a word through the FSM, it does not search on the branch. So we can replace every word in the trie with a dummy integer value.

In this example, there are 5 states from the original 9 states. This is also faster because checking if a child is a word has one less indirection:

// old structure
bool isChildAWord = trie[node.children[0]].isWord;

// new structure
bool isChildAWord = node.children[0] != WORD_STATE

The amount of space used in this representation is now 16 bytes per word, and there are now fewer words. For a dictionary of 178302 strings, the state reduction is from 832012 to 653710 states, which is about 21.5% smaller.