TLDR
MCTS agent implementation runs without errors locally, achieving win-rates of >40% against heuristic driven minimax but fails the autograder - which is a requirement before the project can be submitted. Autograder throws
IndexError: Cannot choose from an empty sequence
. I'm looking for suggestions on the part of the code that is most likely to throw this exception.
Hi, I am currently stuck at this project, which I need to clear before I get to complete the program that I'm enrolled in, in 2 weeks' time. My task, which I have already completed, is to implement an agent to play against the heuristic-driven minimax agent in a game of Isolation between two chess knights. Full implementation details of the game can be found here. For my project, the game will be played on a board measuring 9 x 11, using bitboard encoding. My implementation of MCTS is straightforward, following closely the pseudocode provided in this paper (pg 6).
In essence, the general MCTS approach comprises these 4 parts and they are each implemented by the following nested functions in the CustomPlayer class:
- Selection - tree_policy
- Expansion - best_child, expand
- Simulation - default_policy
Backpropagation - backup_negamax, update_scores
import math import random import time import logging from copy import deepcopy from collections import namedtuple from sample_players import DataPlayer class CustomPlayer(DataPlayer): """ Implement your own agent to play knight's Isolation The get_action() method is the only required method for this project. You can modify the interface for get_action by adding named parameters with default values, but the function MUST remain compatible with the default interface. ********************************************************************** NOTES: - The test cases will NOT be run on a machine with GPU access, nor be suitable for using any other machine learning techniques. - You can pass state forward to your agent on the next turn by assigning any pickleable object to the self.context attribute. ********************************************************************** """ def get_action(self, state): """ Employ an adversarial search technique to choose an action available in the current state calls self.queue.put(ACTION) at least This method must call self.queue.put(ACTION) at least once, and may call it as many times as you want; the caller will be responsible for cutting off the function after the search time limit has expired. See RandomPlayer and GreedyPlayer in sample_players for more examples. ********************************************************************** NOTE: - The caller is responsible for cutting off search, so calling get_action() from your own code will create an infinite loop! Refer to (and use!) the Isolation.play() function to run games. ********************************************************************** """ logging.info("Move %s" % state.ply_count) self.queue.put(random.choice(state.actions())) i = 1 statlist = [] while (self.queue._TimedQueue__stop_time - 0.05) > time.perf_counter(): next_action = self.uct_search(state, statlist, i) self.queue.put(next_action) i += 1 def uct_search(self, state, statlist, i): plyturn = state.ply_count % 2 Stat = namedtuple('Stat', 'state action utility visit nround') def tree_policy(state): statecopy = deepcopy(state) while not statecopy.terminal_test(): # All taken actions at this depth tried = [s.action for s in statlist if s.state == statecopy] # See if there's any untried actions left untried = [a for a in statecopy.actions() if a not in tried] topop = [] toappend = [] if len(untried) > 0: next_action = random.choice(untried) statecopy = expand(statecopy, next_action) break else: next_action = best_child(statecopy, 1) for k, s in enumerate(statlist): if s.state == statecopy and s.action == next_action: visit1 = statlist[k].visit + 1 news = statlist[k]._replace(visit=visit1) news = news._replace(nround=i) topop.append(k) toappend.append(news) break update_scores(topop, toappend) statecopy = statecopy.result(next_action) return statecopy def expand(state, action): """ Returns a state resulting from taking an action from the list of untried nodes """ statlist.append(Stat(state, action, 0, 1, i)) return state.result(action) def best_child(state, c): """ Returns the state resulting from taking the best action. c value between 0 (max score) and 1 (prioritize exploration) """ # All taken actions at this depth tried = [s for s in statlist if s.state == state] maxscore = -999 maxaction = [] # Compute the score for t in tried: score = (t.utility/t.visit) + c * math.sqrt(2 * math.log(i)/t.visit) if score > maxscore: maxscore = score del maxaction[:] maxaction.append(t.action) elif score == maxscore: maxaction.append(t.action) if len(maxaction) < 1: logging.error("IndexError: maxaction is empty!") return random.choice(maxaction) def default_policy(state): """ The simulation to run when visiting unexplored nodes. Defaults to uniform random moves """ while not state.terminal_test(): state = state.result(random.choice(state.actions())) delta = state.utility(self.player_id) if abs(delta) == float('inf') and delta < 0: delta = -1 elif abs(delta) == float('inf') and delta > 0: delta = 1 return delta def backup_negamax(delta): """ Propagates the terminal utility up the search tree """ topop = [] toappend = [] for k, s in enumerate(statlist): if s.nround == i: if s.state.ply_count % 2 == plyturn: utility1 = s.utility + delta news = statlist[k]._replace(utility=utility1) elif s.state.ply_count % 2 != plyturn: utility1 = s.utility - delta news = statlist[k]._replace(utility=utility1) topop.append(k) toappend.append(news) update_scores(topop, toappend) return def update_scores(topop, toappend): # Remove outdated tuples. Order needs to be in reverse or pop will fail! for p in sorted(topop, reverse=True): statlist.pop(p) # Add the updated ones for a in toappend: statlist.append(a) return next_state = tree_policy(state) if not next_state.terminal_test(): delta = default_policy(next_state) backup_negamax(delta) return best_child(state, 0)
The lack of color formatting does make the code really hard to read. So, please feel free to check it out at my github.
I have no issues running the game locally, with my MCTS agent achieving win-rates of >40% (under a 150ms/move limit) against the minimax player. However, when I try submitting my code to the autograder, it gets rejected with the IndexError: Cannot choose from an empty sequence
exception.
From my discussion with the course representation, we believe that the error is likely caused by the usage of random.choice()
. There are 4 instances of its usage in my implementation:
- Line 39, before the MCTS algorithm, to feed a random move to the queue
- Line 66, to randomly select one move that has not been tried
- Line 114, to randomly select an action should there be a tie in the score of the best moves
- Line 122, to simulate the game randomly until terminal state for a chosen move
I assume that the game implementation is correct and calling state.actions() will always return a list of possible moves as long as the state is terminal. Therefore, the only instance that can trigger this exception is Item 3. Items 1 and 4 are simply randomly selecting from available actions, while an explicit check is in place to make sure that random.choice() is not fed with an empty list. Hence, I applied logging to item 3 (even though no exception has been thrown while running locally) and sure enough, did not catch any exception after 50 games.
I apologize for the lengthy post but I do hope that someone out there may be able to catch something that I have missed out in my implementation.