2

I am currently writing an implementation of Monte Carlo Tree Search for a strategy game AI, and have a question about the Rollout (simulation phase).

Descriptions of the algorithm suggest that you should run a simulation until a terminal state is reached, but this is impractical when you have a large search space and a finite time. In my case, I limit the number of simulation steps to a certain value (or finish early if terminating).

At each step in the simulation I evaluate the state but since a simulation consists of a sequence of random actions the evaluated value can increase or decrease during a simulation. The question is: For a non-terminal state simulation, should I return the last state evaluation, or the best state evaluation that was observed during that run?

DrMcCleod
  • 3,865
  • 1
  • 15
  • 24

1 Answers1

2

Typically you would use the value at the end of the simulation. But, MCTS is regularly adapted for many different domains, so you are free to adapt it in a way that gives you the best possible performance.

This idea was, to my knowledge, first proposed for amazons. There they only used a random walk of "about 6 moves" before applying the evaluation function.

Nathan S.
  • 5,244
  • 3
  • 45
  • 55