I am a little confused about how the MCTS "Tree Policy" is implemented. Every paper or article I read talks about going down the tree from the current game state(in MCTS teminology: the root for the player about to make a move). My question is how am I selecting the best child even when I am at the MIN player level ( assuming I am the MAX player). Even if I select some particular action that MIN might take, and my search tree gets deeper through that node, the MIN player during its turn might just as well choose some different node.( If the min player is a amateur human it might just as well choose some node which is not necessarily the best). This kind of makes MAX's entire work in propagating through that node futile since the MIN has chosen a different node. For the steps I am referring to : https://jeffbradberry.com/posts/2015/09/intro-to-monte-carlo-tree-search/ where the tree policy : https://jeffbradberry.com/images/mcts_selection.png kind of makes me believe that they are executing it from a single player perspective.
-
I;m not seeing any Python in the question. – Peter Wood Feb 17 '17 at 15:57
-
Exploitative play requires opponent modelling. For most games it's turned out that assuming the opponent plays optimally is good enough. Poker might be an exception. – Paul Hankin Feb 17 '17 at 16:26
-
Sorry Peter for the tag! I am new to SE and I code mostly in python. Now I realize it was irrelevant. – CS101 Feb 17 '17 at 19:08
-
Paul, then when I implement the "Tree Policy", should I select the best child from MIN's P.O.V. when i am at the level where MIN player would make a move? – CS101 Feb 17 '17 at 19:08
-
@AvisekNaug Yes you try to pick the best move for the MIN player. – Paul Hankin Feb 17 '17 at 20:01
2 Answers
To implement MCTS for two player game, you can simply flip the sign in every step of back-propagation, a one-line change in the code.
This means we are trying to maximize reward in every layer, but when we propagate the reward up the tree the positive reward for your opponent become negative for you when you get to your layer.

- 1,052
- 1
- 8
- 20
For MCTS, you need some way of generating a reasonable estimate of the probability distribution of possible moves. For AlphaGo [1], this is the fast rollout probability, $p_\pi$ in the paper, which takes a state and outputs a rough probability distribution over all possible moves. The AlphaGo team implemented this as a shallow neural net trained first on expert games, and then improved by playing against itself.
[1] http://www.nature.com/nature/journal/v529/n7587/full/nature16961.html

- 2,447
- 17
- 26
-
So you mean that it would not affect my game play because, when MIN plays a different move or the move towards my preferred direction of gameplay, I would be doing MCTS again either way? – CS101 Feb 17 '17 at 19:10
-
Not quite. Obviously it's not possible to predict every one of MIN's moves perfectly without knowing your opponent perfectly, so we guess a few of the best candidates and see where they lead. Looking at the expectimax algorithm instead of the plain minimax might help a bit with the intuition here – c2huc2hu Feb 18 '17 at 16:42