I am currently writing an implementation of Monte Carlo Tree Search for a strategy game AI, and have a question about the Rollout (simulation phase).
Descriptions of the algorithm suggest that you should run a simulation until a terminal state is reached, but this is impractical when you have a large search space and a finite time. In my case, I limit the number of simulation steps to a certain value (or finish early if terminating).
At each step in the simulation I evaluate the state but since a simulation consists of a sequence of random actions the evaluated value can increase or decrease during a simulation. The question is: For a non-terminal state simulation, should I return the last state evaluation, or the best state evaluation that was observed during that run?