From many blogs and this one https://web.archive.org/web/20160308070346/http://mcts.ai/about/index.html We know that the process of MCTS algorithm has 4 steps.
- Selection: Starting at root node R, recursively select optimal child nodes until a leaf node L is reached.
What does leaf node L mean here? I thought it should be a node representing the terminal state of the game, or another word which ends the game. If L is not a terminal node (one end state of the game), how do we decide that the selection step stops on node L?
- Expansion: If L is a not a terminal node (i.e. it does not end the game) then create one or more child nodes and select one C.
From this description I realise that obviously my previous thought incorrect.
Then if L is not a terminal node, it implies that L should have children, why not continue finding a child from L at the "Selection" step?
Do we have the children list of L at this step?
From the description of this step itself, when do we create one child node, and when do we need to create more than one child nodes? Based on what rule/policy do we select node C?
- Simulation: Run a simulated playout from C until a result is achieved.
Because of the confusion of the 1st question, I totally cannot understand why we need to simulate the game. I thought from the selection step, we can reach the terminal node and the game should be ended on node L in this path. We even do not need to do "Expansion" because node L is the terminal node.
- Backpropagation: Update the current move sequence with the simulation result.
Fine. Last question, from where did you get the answer to these questions?
Thank you
BTW, I also post the same question https://ai.stackexchange.com/questions/16606/how-to-understand-the-4-steps-of-monte-carlo-tree-search