2

I have about as simple of a negamax algorithm as possible, for evaluating positions in Tic Tac Toe. The state of the game is stored as an array in numpy, with X's pieces represented by 1, and O's pieces represented by four.

I was testing this just now, and found:

a = np.zeros(9).reshape(3,3)
negaMax(a, 6, 1) # Returned zero as it should
negaMax(a, 7, 1) # Returns 100

Meaning that my algorithm thinks it has found a way for X to win in seven plies in a game of Tic Tac Toe, which is obviously impossible against decent play. I can't work out how to have it print the best moves it has found, so am having real trouble debugging this. What am I doing wrong?

def winCheck(state):
        """Takes a position, and returns the outcome of that game"""
        # Sums which correspond to a line across a column
        winNums = list(state.sum(axis=0))
        # Sums which correspond to a line across a row
        winNums.extend(list(state.sum(axis=1)))
        # Sums which correspond to a line across the main diagonal
        winNums.append(state.trace())
        # Sums which correspond to a line across the off diagonal
        winNums.append(np.flipud(state).trace())

        if Square.m in winNums:
                return 'X'
        elif (Square.m**2 + Square.m) in winNums:
                return 'O'
        elif np.count_nonzero(state) == Square.m**2:
                return 'D'
        else:
                return None

def moveFind(state):
        """Takes a position as an nparray and determines the legal moves"""
        moveChoices = []

        # Iterate over state, to determine which squares are empty
        it = np.nditer(state, flags=['multi_index'])
        while not it.finished:
            if it[0] == 0:
                    moveChoices.append(it.multi_index)
            it.iternext()
        return moveChoices

def moveSim(state, move, player):
        """Create the state of the player having moved without interfering with the board"""
        simState = state.copy()
        if player == 1:
                simState[move] = 1
        else:
                simState[move] = gamecfg.n + 1
        return simState

def positionScore(state):
        """The game is either won or lost"""
        if winCheck(state) == 'X':
                return 100
        elif winCheck(state) == 'O':
                return -100
        else:
                return 0

def negaMax(state, depth, colour):
        """Recursively find the best move via a negamax search"""
        if depth == 0:
                return positionScore(state) * colour

        highScore = -100

        moveList = moveFind(state)
        for move in moveList:
                score = -negaMax(moveSim(state, move, colour), depth -1, colour * -1)
                highScore = max(score, highScore)

        return highScore
Qiri
  • 251
  • 1
  • 2
  • 10
  • 1
    Sorry @JacobIRR, I somehow missed some lines in the middle while copying, I have edited the question. – Qiri Jul 02 '17 at 18:18
  • I wonder whether it matters that you don't stop the game when someone has won? Perhaps it is possible to always get a line of 3 X's if you don't mind the other person getting a line of 3 O's first? – Peter de Rivaz Jul 02 '17 at 18:20
  • @PeterdeRivaz That's not something I'd thought of at all, but I have just proven to myself that it is very possible to force three in a row in seven plies if you start in the centre, and are not worried about losing first. – Qiri Jul 02 '17 at 18:33
  • @PeterdeRivaz I've just added if depth == 0 or positionScore(winState) != 0: for negaMax's base case, and it now works as expected. If you make an answer, I'll accept it, and thank you. Do you have any ideas about how I can retrieve the final state, and list of moves, along with highScore? – Qiri Jul 02 '17 at 18:41

1 Answers1

2

Your code does not consider the game to stop when a line of 3 symbols is made.

This means that it is playing a variant of tic-tac-toe where X wins if he makes a line of 3 even after O has made a line of 3.

For this variant, the program has correctly found that it is possible for X to always win!

(I came across the same situation with a chess program I made where the computer was happy to sacrifice its king if it would reach checkmate a little later...)

Peter de Rivaz
  • 33,126
  • 4
  • 46
  • 75