Questions tagged [monte-carlo-tree-search]

Monte-Carlo Tree Search is a best-first, rollout-based tree search algorithm. It gradually improves its evaluations of nodes in the trees using (semi-)random rollouts through those nodes, focusing a larger proportion of rollouts on the parts of the tree that are the most promising. This tag should be used for questions about implementation of this algorithm.

Monte-Carlo Tree Search (MCTS) is a best-first tree search algorithm. "Best-first" means that it focuses the majority of the search effort on the most promising parts of a search tree. This is a popular algorithm for automated game-playing in Artificial Intelligence (AI), but also has applications outside of AI.

For an introduction into the types of search trees that MCTS is typically used for, see minimax.

Older algorithms, such as minimax and alpha-beta-pruning, evaluate a node by systematically searching all the nodes below it, which can be very computationally expensive. The basic idea of MCTS is to rapidly approximate such systematic evaluations of nodes by using (semi-)random rollouts and averaging the evaluations at the end of those rollouts. Over time, the algorithm will focus a larger amount of search effort on parts of the search tree that appear promising based on the earlier rollouts. This enables the algorithm to gradually improve the approximations of the evaluations in those parts of the tree. Additionally, the algorithm gradually grows a tree by storing a number (typically 1) of nodes in memory for every rollout. The parts of the tree that are stored in memory are not traversed using a (semi-)random strategy, but using a different ("Selection") strategy. This guarantees that, given an infinite amount of computation time, the algorithm converges to optimal solutions.

An extensive survey, describing many enhancements and applications of the algorithm, can be found in:

  • Browne, C., Powley, E., Whitehouse, D., Lucas, S., Cowling, P. I., Rohlfshagen, P., Tavener, S., Perez, D., Samothrakis, S., and Colton, S. (2012). A Survey of Monte Carlo Tree Search Methods. Computation Intelligence and AI in Games, IEEE Transactions on, Vol. 4, No. 1, pp. 1-43.

The algorithm was originally proposed in:

  • Kocsis, L. and Szepesvári, C. (2006). Bandit Based Monte-Carlo Planning. Proceedings of the 17th European Conference on Machine Learning (ECML 2006) (eds. J. Fürnkranz, T. Scheffer, and M. Spiliopoulou), Vol. 4212 of Lecture Notes in Computer Science, pp. 282-293, Springer Berlin Heidelberg.
  • Coulom, R. (2007). Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search. Computers and Games (eds. H. J. van den Herik, P. Ciancarini, and H. H. L. M. Donkers), Vol. 4630 of Lecture Notes in Computer Science, pp. 72-83, Springer Berlin Heidelberg.
79 questions
24
votes
1 answer

Monte Carlo Tree Search agent in a game of Isolation - Debug Suggestions

TLDR MCTS agent implementation runs without errors locally, achieving win-rates of >40% against heuristic driven minimax but fails the autograder - which is a requirement before the project can be submitted. Autograder throws IndexError:…
10
votes
1 answer

How is Monte Carlo Tree Search implemented in practice

I understand, to a certain degree, how the algorithm works. What I don't fully understand is how the algorithm is actually implemented in practice. I'm interested in understanding what optimal approaches would be for a fairly complex game (maybe…
7
votes
2 answers

Transposition table in Monte Carlo Tree Search algorithm unintended effect on UCT score

So I implemented a transposition table in a Monte Carlo Tree Search algorithm using UCT. This allows for keeping a cumulative reward value for a game state, no matter where, and how many times, it is encountered throughout the tree. This increases…
6
votes
2 answers

How does Monte Carlo Search Tree work?

Trying to learn MCST using YouTube videos and papers like this one. http://www0.cs.ucl.ac.uk/staff/D.Silver/web/Applications_files/grand-challenge.pdf However I am not having much of a luck understanding the details beyond the high level theoretical…
jiminssy
  • 2,149
  • 6
  • 28
  • 45
5
votes
1 answer

Setting priority queue values to optimize the probability of finding a 'gift'

I have a priority queue of "door numbers". I get the next door number from the priority queue (i.e. the door with the lowest corresponding priority value), and then open the door. Behind the door, there may be a gift or not. Based on the presence /…
5
votes
2 answers

How to I make my AI algorithm play 9 board tic-tac-toe?

In order to make it easy for others to help me I have put all the codes here https://pastebin.com/WENzM41k it will starts as 2 agents compete each other. I'm trying to implement Monte Carlo tree search to play 9-board tic-tac-toe in Python. The…
Vera
  • 61
  • 1
  • 6
4
votes
1 answer

Is Monte Carlo learning policy or value iteration (or something else)?

I am taking a Reinforcement Learning class and I didn’t understand how to combine the concepts of policy iteration/value iteration with Monte Carlo (and also TD/SARSA/Q-learning). In the table below, how can the empty cells be filled: Should/can it…
4
votes
1 answer

Monte Carlo Tree Search Tic-Tac-Toe -- Poor Agent

I'm trying to implement Monte Carlo tree search to play tic-tac-toe in Python. My current implementation is as follows: I have a Board class that handles alterations to the tic-tac-toe board. The state of the board is represented by a 2x3x3 numpy…
3
votes
0 answers

Why doesn't this Monte Carlo Tree Search algorithm work properly?

PROBLEM I'm writing a Monte-Carlo tree search algorithm to play chess in Python. I replaced the simulation stage with a custom evaluation function. My code looks perfect but for some reason acts strange. It recognizes instant wins easily enough but…
3
votes
1 answer

How to restore previous state to gym environment

I'm trying to implement MCTS on Openai's atari gym environments, which requires the ability to plan: acting in the environment and restoring it to a previous state. I read that this can be done with the ram version of the games: recording the…
3
votes
1 answer

How do I effectively parallelize AlphaZero on the GPU?

I'm implementing a version of AlphaZero (AlphaGo's most recent incarnation) to be applied to some other domain. The crux of the algorithm is a Monte Carlo Tree Search of the state space (CPU) interleaved with 'intuition' (probabilities) from a…
3
votes
1 answer

Is MonteCarloTreeSearch an appropriate method for this problem size (large action/state space)?

I'm doing a research on a finite horizon decision problem with t=1,...,40 periods. In every time step t, the (only) agent has to chose an action a(t) ∈ A(t), while the agent is in state s(t) ∈ S(t). The chosen action a(t) in state s(t) affects the…
3
votes
2 answers

Monte Carlo tree search - handling game ending nodes

I have implemented a MCTS for a 4 player game which is working well, but I'm not sure I understand expansion when the game ending move is in the actual Tree rather than in the rollout. At the start the game game winning/losing positions are only…
Pajh
  • 33
  • 3
3
votes
1 answer

Monte Carlo Tree Search - intuition behind child selection function for games of two players with opposite goals

Simple question on hello world example of the MCTS for tic-tac-toe, Let's assume we are given a board and we want to make an optimal decision. As I undestand the choice of consecutive nodes while simulation (until leaf is met) is determined by a…
2
votes
1 answer

Monte Carlo Tree Search: Getting a value from the rollout

I am currently writing an implementation of Monte Carlo Tree Search for a strategy game AI, and have a question about the Rollout (simulation phase). Descriptions of the algorithm suggest that you should run a simulation until a terminal state is…
DrMcCleod
  • 3,865
  • 1
  • 15
  • 24
1
2 3 4 5 6