Walk the Line - MCTS & Evaluation Functions
As Monte Carlo Tree Search (MCTS) is becoming more popular in the world of Game AI, AI programmers are starting to discover when, and more importantly when not to use it.
A very basic outline of MCTS is that at each iteration it tries to expand on the best action known thus far, simulating till the end of a game, returning and finally integrating the result.
One of the weaknesses the technique has, is that it doesn’t seem to perform well in those problems where the path to victory is narrow.
A trivial example of this problem would be Tic Tac Toe: A single misstep will lead to defeat.
Obviously MCTS will be able to handle games as Tic Tac Toe with ease, but we’ll use it to provide some insight into the problem at hand.
A simple evaluation function in MCTS would return -1 if the simulation returned a loss and +1 for a win. The values returned by the simulation are then added to the current value of the action that was selected for the simulation.
Most Academic AI programmers stop here, and focus on improving the technique itself. In this, Game AI programmers are better armed than their academic counterparts. We’ve got plenty of experience in expressing behavior through functions, structuring them exactly so that several simple functions will give the exact behavior we want.
However, we’ve primarily got experience in using evaluation functions in the context of Finite State Machines (FSM) or Behavior Trees.
The next step most Game AI programmers now take is to structure the evaluation function to incorporate various elements of a game, combining them into a single function that is normalized over [-1 .. 1].
And disaster strikes.
The performance of the algorithm doesn’t seem to be affected, or even worse, performance drops!
The AI starts playing moves which are obviously weak or even outright wrong.
The core of MCTS is that the technique is based on the assumption that if a move really is better than others, this will show if we play it often enough.
By “improving” the evaluation function we’ve now blurred the distinction between a loss and a win.
As MCTS can simulate a single action thousands of times, a very small score gained consistently can very easily start to outweigh a guaranteed loss further down the line.
Make sure that you give a far greater weight to both victory and loss, as compared to general performance or components.
Simply put, a victory is not equal to the sum of various components in the game.
Getting 95 points out of 100 and getting second place shouldn’t be almost the same as getting a perfect 100 and getting first place.
Knowing how to structure an evaluation function for use in search techniques is invaluable, but more importantly, completely different from modelling as used for behavior based techniques.
With search techniques we want to alter the evaluation function such that it exploits the strengths of a technique. We are trying to alter the behavior of the search.
The following two images show how a simple adjustment can have far reaching consequences:
(The size of the circles and thickness of the edges show where MCTS spent time in the tree.)
In the first image MCTS did not look very deep into the tree, spending a lot of time on actions which proved fruitless.
In the second image we only altered the evaluation function by returning -100 for a loss instead of -1. This gave MCTS very paranoid behavior and resulted in it exploring those nodes which gave the enemy as little as opportunity as possible.
In Game Programming every nanosecond spent not rendering counts and a good evaluation function is the difference between being able to churn out good strategies in milliseconds as opposed to seconds.
It will ensure that the algorithm will take less time before converging on the optimal move.
Even in the few domains where we can spend a few seconds (!) calculating the optimal move, a well structured evaluation function will have significant impact.
Or better said; a badly structured one will cost you the game.