10

I decided to write a small program that solves TicTacToe in order to try out the effect of some pruning techniques on a trivial game. The full game tree using minimax to solve it only ends up with 549,946 possible games. With alpha-beta pruning, the number of states required to evaluate was reduced to 18,297. Then I applied a transposition table that brings the number down to 2,592. Now I want to see how low that number can go.

The next enhancement I want to apply is a strategic reduction. The basic idea is to combine states that have equivalent strategic value. For instance, on the first move, if X plays first, there is nothing strategically different (assuming your opponent plays optimally) about choosing one corner instead of another. In the same situation, the same is true of the center of the walls of the board, and the center is also significant. By reducing to significant states only, you end up with only 3 states for evaluation on the first move instead of 9. This technique should be very useful since it prunes states near the top of the game tree. This idea came from the GameShrink method created by a group at CMU, only I am trying to avoid writing the general form, and just doing what is needed to apply the technique to TicTacToe.

In order to achieve this, I modified my hash function (for the transposition table) to enumerate all strategically equivalent positions (using rotation and flipping functions), and to only return the lowest of the values for each board. Unfortunately now my program thinks X can force a win in 5 moves from an empty board when going first. After a long debugging session, it became apparent to me the program was always returning the move for the lowest strategically significant move (I store the last move in the transposition table as part of my state). Is there a better way I can go about adding this feature, or a simple method for determining the correct move applicable to the current situation with what I have already done?

Sandipan Dey
  • 21,482
  • 2
  • 51
  • 63
Nick Larsen
  • 18,631
  • 6
  • 67
  • 96
  • This is an interesting question, and as far as I can tell all the other implementations floating around also use the "check every square" method rather than building a decision tree. I'm not sure if any of this can be called A.I. though :s – Codesleuth Jun 07 '10 at 12:28
  • @Codesleuth careful about terminology -- a decision tree is a machine learning technique that does not apply here – Shaggy Frog Jun 15 '10 at 18:07
  • @Shaggy Frog: If you're talking about neural networks, that's not what I meant. – Codesleuth Jun 16 '10 at 08:12
  • @Codesleuth a decision tree is *not* a neural network, but they are both forms of machine learning algorithms. Also, heuristic search as described here is certainly a form of AI. I recommend you spend some time researching the topic. – Shaggy Frog Jun 16 '10 at 14:07
  • @Shaggy Frog: What? Why are you arguing this? My suggestion that this might not be A.I. was a personal view, in that I don't consider pre-determined decision making tables or incremental searches advanced enough to be intelligence. I've come to this conclusion because I have *already* researched the topic. – Codesleuth Jun 16 '10 at 14:29
  • @Codesleuth if you are conflating neural networks and decision trees, than clearly some more research is in order – Shaggy Frog Jun 16 '10 at 14:32
  • @Shaggy Frog: I think we already established it was an error for me to say "tree" rather than "table". Besides, I won't be researching A.I. any further for now, your insistence will be ineffective. – Codesleuth Jun 16 '10 at 14:41
  • @Codesleuth your comment that you don't consider this AI is somewhat ignorant as well as completely unnecessary. – Shaggy Frog Jun 23 '10 at 15:33

7 Answers7

5

My gut feeling is that you are using too big of a hammer to attack this problem. Each of the 9 spots can only have one of two labels: X or O or empty. You have then at most 3^9 = 19,683 unique boards. Since there are 3 equivalent reflections for every board, you really only have 3^9 / 4 ~ 5k boards. You can reduce this by throwing out invalid boards (if they have a row of X's AND a row of O's simultaneously).

So with a compact representation, you would need less than 10kb of memory to enumerate everything. I would evaluate and store the entire game graph in memory.

We can label every single board with its true minimax value, by computing the minimax values bottom up instead of top down (as in your tree search method). Here's a general outline: We compute the minimax values for all unique boards and label them all first, before the game starts. To make the minimax move, you simply look at the boards succeeding your current state, and pick the move with the best minimax value.

Here's how to perform the initial labeling. Generate all valid unique boards, throwing out reflections. Now we start labeling the boards with the most moves (9), and iterating down to the boards with least moves (0). Label any endgame boards with wins, losses, and draws. For any non-endgame boards where it's X's turn to move: 1) if there exists a successor board that's a win for X, label this board a win; 2) if in successor boards there are no wins but there exists a draw, then label this board a draw; 3) if in successor boards there are no wins and no draws then label this board a loss. The logic is similar when labeling for O's turn.

As far as implementation goes, because of the small size of the state space I would code the "if there exists" logic just as a simple loop over all 5k states. But if you really wanted to tweak this for asymptotic running time, you would construct a directed graph of which board states lead to which other board states, and perform the minimax labeling by traversing in the reverse direction of the edges.

Eric
  • 706
  • 5
  • 13
  • 2
    I'd vote for this as the answer, but you're missing the point of the exercise which is to use a specific hammer. Still, you're right in that there aren't 549,946 possible games of TicTacToe (even including unselected, there isn't that many possible *states*, let alone games). The idea is interesting, but the poster needs to work out the details of what can happen a bit more first. Start with the 512 possible ending states for a 3x3 grid, eliminate the impossible, and *then* work on equivalent moves and states. – jmoreno Jun 22 '10 at 05:36
  • Thanks, maybe I can work out more details to a quick algorithm. I understand your point about the exercise, but I would suggest that there is little value in using a technique that poorly fits a problem for that particular problem, especially when alternatives are not only cheaper, simpler, and faster. Using a technique that doesn't fit a problem also may limit what you can learn about that technique (i.e. in the real world if you would have always used an alternative, what reusable knowledge/experience would you have gained from the exercise?) – Eric Jun 22 '10 at 17:29
  • I really appreciate this answer and I have a few questions, and a few points. If you include move history in your state, as basic minimax implicitly does, there are exactly 549,946 possible reachable states which terminate at the moment the board fills or either player makes 3 in a row, and far more if you allow the game to progress beyond finding a winner on a non empty board, though I did not. I do agree the extra work is worthless however, and clearly basic minimax requires an enormous effort for even trivial games, but that does not make it bad for exercise. – Nick Larsen Jun 22 '10 at 20:35
  • The use of a transposition table should reduce the tree size to only the number of states possibly reachable in a game regardless of history, which is great because history does not affect the available moves in tic tac toe and therefore has no effect on the optimal move. Using only a transposition table my small program shows 5478 reachable states in tic tac toe, which I think is correct for my rules. – Nick Larsen Jun 22 '10 at 21:10
  • 1
    Since board spots can be blank, the possible number of states no matter what is 3^9 = 19683. Realizing that valid, reachable states have either the same number of X's as O's or 1 more X than O's, and many states are not reachable due to my rules, 5478 seems reasonable. 2^9 only represents the leaf nodes for filling every position on the board in no particular order. – Nick Larsen Jun 22 '10 at 21:19
  • I have some questions about your algorithm but they will have to wait until after dinner. – Nick Larsen Jun 22 '10 at 21:31
  • You're right -- I completely forgot about incomplete games. So with X, O or blank, we start off with 3^9 = 19,683 as you mentioned. For every one of these 19,683 boards, there are three other redundant reflections of it, so we really have 3^9 / 4 = 4,920. Any additional rules you've got would cut this down even further. Minimax is just a rule of how to propagate the value of states several moves ahead back up to the present. In general it isn't committed to tree search vs. graph search. The algorithm I outlined interprets the game as a graph, while your approach interprets it as a tree. – Eric Jun 23 '10 at 00:38
  • The use of tree search in this game blows up quickly because there are an exponential number of paths that reach an identical state in the game tree, and the transposition table can help reduce that. But if you search the game graph right from the start, you avoid all those exponential paths in the first place. Most interesting games have graphs too large to fit in memory, so you would search the tree instead, but not this one. – Eric Jun 23 '10 at 00:49
  • Assuming you must use a tree search algorithm, the next improvement I would add to your program is to just include a cache that maps states to minimax values. The cache will never grow beyond memory limits so don't worry about evicting old entries. During search, check the current state against the cache. On a cache-hit, return immediately the previously computed minimax value in the cache. On a cache-miss, proceed normally with your algorithm, and when it returns from searching that subtree of this state, insert the newly computed minimax value along with the state into your cache. – Eric Jun 23 '10 at 00:54
  • Incidentally, the minor change here would be that instead of caching a really large set of move sequences (as your transposition table does), we're caching the much smaller set of states instead. – Eric Jun 23 '10 at 01:05
  • I think you misunderstood how I am using a transposition table. I do not not take into account move history when calculating the position hash, and therefore do not store move history as part of my state. The cache you suggest describes my transposition table. As part of my state, I store the best follow up move to this position so I do not have to check the minimax values for all subsequent states of the current position, but not the path leading to a particular state. – Nick Larsen Jun 23 '10 at 13:41
  • If I am not mistaken, a transposition table which only takes relevant move history into account (think en passant for chess) exactly reduces the search to a graph, hence its power. – Nick Larsen Jun 23 '10 at 13:56
  • If I understand your algorithm correctly, first I should enumerate all of the possible reachable positions, calculate and store their minimax values in a state graph and traverse the graph as the game is being played. If that is correct, can you explain how that is different than using a minimax search and storing evaluated states in a transposition table? If not, can you explain your solution in a little more detail? – Nick Larsen Jun 23 '10 at 14:02
  • Your calculation for the number of states, as already mentioned, is off by a factor of magnitude, which scuttles the rest of your answer. – Shaggy Frog Jun 23 '10 at 15:30
  • @NickLarsen I did misunderstand your use of a transposition table. I think you are in fact doing the same thing as I had described with a cache, so there is no difference. My original answer was an alternative method of computing the same result without using tree search. – Eric Jun 23 '10 at 18:21
  • If you allow your transposition table to store all nodes down to the leaves, you don't have to do any search after each move as the game progresses. You can just use your transposition table as a lookup table after the initial computation. You can't typically allow your transposition table to store the entire tree but you can here since the state space will just be about 5k nodes (don't try this Shaggy because it looks like this order of magnitude difference exceeds your computer's memory capacity). – Eric Jun 23 '10 at 18:34
  • @Eric You are describing solving the game. That's fine but it doesn't really answer the OP's question, nor will it help other users with similar questions (using rotations and reflections to reduce the branching factor), especially if those users have non-trivial domains (it took my former supervisor years to solve checkers using your approach). Meanwhile your original answer still contains the obvious error calculating the size of the state space. Finally, your snark about memory is unhelpful. – Shaggy Frog Jun 24 '10 at 17:01
  • @Shaggy I've revised the answer as you have suggested to correct the state-space estimate. As your other comments are not about helping Nick but rather directed personally at me, I won't be responding to them. Thanks. – Eric Jun 24 '10 at 21:17
  • @Eric If that's your policy you should reconsider pitching first-strike personal comments of your own. Just saying. – Shaggy Frog Jun 25 '10 at 03:00
  • For those interested in numbers regarding tic tac toe - http://sakuya.pl/show/0048 – rr- Jan 22 '14 at 18:40
2

You're on the right track when you're thinking about reflections and rotations. However, you're applying it to the wrong place. Don't add it to your transposition table or your transposition table code -- put it inside the move generation function, to eliminate logically equivalent states from the get-go.

Keep your transposition table and associated code as small and as efficient as possible.

Shaggy Frog
  • 27,575
  • 16
  • 91
  • 128
  • This helped a lot. Funny enough though, as soon as I did this, I ran into all kinds of problems. Turns out I was focusing on solving the game up front, and attempting to play from previously evaluated positions, which sometimes had not been evaluated. Once I modified it to just re run the search for each move, all the problems went away. Now I am on to move ordering techniques. Do you know off the top of your head of any good move ordering publications? – Nick Larsen Jun 23 '10 at 14:07
  • Not offhand as move ordering is usually heavily tied in with the domain. Near the top of the search tree it can be quite important but near the bottom it might cost more time than it saves in node expansions. Consider only doing it at nodes (say) >=3 ply from the bottom. One cheap and easy thing to do in general is to do a depth-1 search for each node and then sort based on that, which helps you leverage your existing evaluation function. – Shaggy Frog Jun 23 '10 at 15:27
2

Out of curiosity, I wrote a program to build a full transposition table to play the game without any additional logic. Taking the 8 symmetries into account, and assuming computer (X) starts and plays deterministic, then only 49 table entries are needed!

1 entry for empty board

5 entries for 2 pieces

21 entries for 4 pieces

18 entries for 6 pieces

4 entries for 8 pieces

Avi Tal
  • 157
  • 7
  • 1
    This is awesome! Can you make it handle playing second as well? – Nick Larsen Apr 23 '20 at 14:56
  • 2
    @NickLarsen Sure. Also, if the computer starts at the center and not play corner first (while still playing "best play" and never loosing), the table becomes much smaller... I am now trying to analytically determine (and simulate) the minimum Transposition Table size required for "best play" in other games, assuming the computer plays deterministic – Avi Tal Apr 23 '20 at 22:30
  • 10 years later, this is amazing. – Nick Larsen Apr 24 '20 at 15:24
  • "deterministic" means that the computer will respond exactly the same each time on each specific board configuration. – Avi Tal Jan 19 '21 at 20:14
  • I only encode the "possible" game tree. Meaning, game positions that are not possible because the computer will never lead the game to these situations are not encoded. In general, this allows storing only about an order of square root of the total possible positions. – Avi Tal Jan 19 '21 at 21:29
  • 1
    Now I got a solution with N = 37, see here: https://stackoverflow.com/a/65800312/502187 But I am storing the boards with odd number of pieces on it. –  Jan 19 '21 at 22:19
1

You need to return the (reverse) transposition along with the lowest value position. That way you can apply the reverse transposition to the prospective moves in order to get the next position.

Chris Dodd
  • 119,907
  • 13
  • 134
  • 226
  • This is a technique I have found very valuable in the past, particularly for very complex games like when I was working on poker, but it felt like it would be more effort than it was worth for this simple project. I have never written a general form state mapper either. Do you know of any? Might be a good project to take on sometime. – Nick Larsen Jun 23 '10 at 14:11
0

Why do you need to make the transposition table mutable? The best move does not depend on the history.

namin
  • 37,139
  • 8
  • 58
  • 74
0

There is a lot that can be said about this, but I will just give one tip here which will reduce your tree size: Matt Ginsberg developed a method called Partition Search which does equivalency reductions on the board. It worked well in Bridge, and he uses tic-tac-toe as an example.

Nathan S.
  • 5,244
  • 3
  • 45
  • 55
0

You may want to try to solve tic-tac-toe using monte-carlo simulation. If one (or both) of the players is a machine player, it could simply use the following steps (this idea comes from one of the mini-projects in the coursera course Principles of Computing 1 which is a part of the Specialization Fundamentals of Computing, taught by RICE university.):

Each of the machine players should use a Monte Carlo simulation to choose the next move from a given TicTacToe board position. The general idea is to play a collection of games with random moves starting from the position, and then use the results of these games to compute a good move.

When a paritular machine player wins one of these random games, it wants to favor the squares in which it played (in hope of choosing a winning move) and avoid the squares in which the opponent played. Conversely, when it loses one of these random games, it wants to favor the squares in which the opponent played (to block its opponent) and avoid the squares in which it played.

In short, squares in which the winning player played in these random games should be favored over squares in which the losing player played. Both the players in this case will be the machine players.

The following animation shows a game played between 2 machine players (that ended in a tie), using 10 MC trials at each board state to determine the next move.

It shows how each of the machine players learns to play the game just by using Monte-Carlo Simulation with 10 trials (a small number of trials) at every state of the board, the scores shown at the right bottom of each grid square are used by each of the players at their corresponding turns, to choose its next move (the brighter cells represent better moves for the current player, as per the simulation results).

enter image description here

Here is my blog on this for more details.

Sandipan Dey
  • 21,482
  • 2
  • 51
  • 63