I will start with some basics and move to harder stuff later.
Basic agent and a testing framework
No matter what approach you take you need to start with something really simple and dumb. The best approach for a dumb agent is a random one (generate all possible moves, select one at random). This will serve as a starting point to compare all your other agents. You need a strong framework for comparison. Something that takes various agents, allows to play some number of games between them and returns the matrix of the performance. Based on the results, you calculate the fitness for each agent. For example your function tournament(agent1, agent2, agent3, 500)
will play 500 games between each pair of agent (playing the first/second) and returns you something like:
x -0.01 -1.484 | -1.485
0.01 x -1.29 | -1.483
1.484 1.29 x | 2.774
Here for example I use 2 points for a win, 1 point for draw scoring function, and at the end just summing everything to find the fitness. This table immediately tells me that agent3
is the best, and agent1
is not really different from agent2
.
So once these two important things are set up you are ready to experiment with your evaluation functions.
Let's start with selecting features
First of all you need to create not a terrible
evaluation function. By this I mean that this function should correctly identify 3 important aspects (win/draw/loss). This sounds obvious, but I have seen significant amount of bots, where the creators were not able to correctly set up these 3 aspects.
Then you use your human ingenuity to find some features of the game state. The first thing to do is to speak with a game expert and ask him how he access the position.
If you do not have the expert, or you even just created the rules of your game 5 minutes ago, do not underestimate the human's ability to search for patters. Even after playing a couple of games, a smart person can give you ideas how he should have played (it does not mean that he can implement the ideas). Use these ideas as features.
At this point you do not really need to know how these features affect the game. Example of features: value of the pieces, pieces mobility, control of important positions, safety, total number of possible moves, closeness to a finish.
After you coded up these features and used them separately to see what works best (do not hurry up to discard features that do not perform reasonable by itself, they might be helpful in conjunction with others), you are ready to experiment with combinations.
Building better evaluations by combining and weighting simple features. There are a couple of standard approaches.
Create an uber function based on various combinations of your features. It can be linear eval = f_1 * a_1 + ... f_n * a_n
(f_i
features, a_i
coefficients), but it can be anything. Then instantiate many agents with absolutely random weights for this evaluation function and use genetic algorithm to play them agains each other. Compare the results using the testing framework, discard a couple of clear losers and mutate a couple of winners. Continue the same process. (This is a rough outline, read more about GA)
Use the back-propagation idea from a neural networks to back propagate the error from the end of the game to update the weights of your network. You can read more how it was done with backgammon (I have not written anything similar, so sorry for the shortness).
You can work without evaluation function! This might sound insane for a person who only heard about minimax/alpha-beta, but there are methods which do not require an evaluation at all. One of them is called Monte Carlo Tree Search and as a Monte Carlo in a name suggests it uses a lot of random (it should not be random, it can use your previous good agents) game plays to generate a tree. This is a huge topic by itself, so I will give you mine really high-level explanation. You start with a root, create your frontier, which you try to expand. Once you expand something, you just randomly go to the leaf. Getting the result from the leaf, you backpropagate the result. Do this many many times, and collect the statistics about each child of the current frontier. Select the best one. There is significant theory there which relates to how do you balance between exploration and exploitation and a good thing to read there is UCT (Upper Confidence Bound algorithm)