Monte Carlo Tree Search: Getting a value from the rollout

Question

I am currently writing an implementation of Monte Carlo Tree Search for a strategy game AI, and have a question about the Rollout (simulation phase).

Descriptions of the algorithm suggest that you should run a simulation until a terminal state is reached, but this is impractical when you have a large search space and a finite time. In my case, I limit the number of simulation steps to a certain value (or finish early if terminating).

At each step in the simulation I evaluate the state but since a simulation consists of a sequence of random actions the evaluated value can increase or decrease during a simulation. The question is: For a non-terminal state simulation, should I return the last state evaluation, or the best state evaluation that was observed during that run?

Is this backgammon? – Øystein Schønning-Johansen May 28 '20 at 21:22 — Øystein Schønning-Johansen, May 28 '20 at 21:22
@ØysteinSchønning-Johansen It is not. – DrMcCleod May 29 '20 at 06:34 — DrMcCleod, May 29 '20 at 06:34

score 2 · Answer 1 · answered May 28 '20 at 14:16

Typically you would use the value at the end of the simulation. But, MCTS is regularly adapted for many different domains, so you are free to adapt it in a way that gives you the best possible performance.

This idea was, to my knowledge, first proposed for amazons. There they only used a random walk of "about 6 moves" before applying the evaluation function.

Monte Carlo Tree Search: Getting a value from the rollout

1 Answers1