James P Houghton

# Knight's Hidden Path 0 - Hidden Markov Models

14 Sep 2013

Imagine a knight on a chess board. When unconstrained by edges, he has 8 possible moves:
Alternately, he can choose to stay where he is. Lets number these moves as follows:
Now, assume the knight starts on a particular (known) square on the board, and chooses a move (0-8) at random. If the move is valid, he takes it, and if it would force him off the edge of the board he stays where he is, and draws another random move.  In python, we can write this as a 'move' function:

def move(position, move):
#calculate new position
if move == 0: newpos = position
elif move == 1: newpos = position + [1,2]
elif move == 2: newpos = position + [2,1]
elif move == 3: newpos = position + [2,-1]
elif move == 4: newpos = position + [1,-2]
elif move == 5: newpos = position + [-1,-2]
elif move == 6: newpos = position + [-2,-1]
elif move == 7: newpos = position + [-2,1]
elif move == 8: newpos = position + [-1,2]

if newpos.max()<=7 and newpos.min()>=0: return newpos
else: return position

One possible realization of the knight's path (for 100 iterations) is below:
The first 20 moves of which are:
012345678910111213141516171819
x32244444531111011342
y57766666456666422310
Now lets assume that every turn, the night reports his x (horizontal) coordinate, but not his y (vertical coordinate). Can we, knowing the starting position and x coordinate, infer the 'hidden' y coordinate? Here's the full list of x positions:

`[3 2 2 4 4 4 4 4 5 3 1 1 1 1 0 1 1 3 4 2 2 0 0 0 0 0 0 0 0 0 0 1 1 0 2 0 1 0 1 3 2 0 0 1 0 1 1 0 0 1 3 3 5 4 4 4 4 6 4 5 4 4 6 4 2 1 0 0 0 2 2 3 2 3 4 4 4 4 4 4 4 3 1 1 3 5 5 4 5 4 6 4 5 7 6 7 7 6 6 4]`

This is an example of a dynamic problem with an un-observed state. As this particular problem takes place on a discrete state space, we can use a Hidden Markov Model.

In a markov model, the system can exist in a set of states (in this case, 64 - one for each square on the board) and this state transitions to the next state in the sequence according to a set of fixed probabilities, called the transition matrix. In our system, there are only 9 states that the system can transition to from each starting position (including staying at the same location) and we can calculate the probabilities of moving from one position to another:

def transprob(x1,y1, x2,y2):
count = 0.
for potmove in range(9):
[nx, ny] = move(np.array([x1,y1]), potmove)
if(nx == x2 and ny == y2): count+=1
return count/9

And then build a transition matrix showing the probability of moving from each state to each other possible state. Here we've reshaped the 8x8 board into a 1-dimensional array with 64 elements, counting rows first.

transmat = np.zeros((64,64))
for S1 in range(64):
for(S2) in range(64):
transmat[S1, S2] = transprob(S1/8, S1%8, S2/8, S2%8)

Plotting this array we see several things: first that in the vast majority of cases, the diagonal (ie, staying in the same position) is the most likely occurrence given a particular state. As we expect, there are up to 9 nonzero positions for each row, and the matrix is symmetric, meaning that the probability of transitioning from state A to state B is the same as the probability of transitioning from state B to state A.
We also see that all rows contain at least two nonzero values, meaning that the system won't get stuck in any one position, and that all columns contain at least one value, meaning that all points on the state space can be reached from another state.

We now need to define a matrix of 'emission probabilities' - meaning the likelihood that a particular value of x will be observed given the state the system is in. For our problem this is relatively straightforward, as we just read the x value straight out of the state coordinates:

emissionprob = np.zeros((64,8))
for State in range(64):
for Emission in range(8):
if State/8 == Emission: emissionprob[State, Emission] = 1

If we feed these assumptions into the scikit-learn Multinomial Hidden Markov Model package, using the Viterbi Algorithm, we can calculate the most likely path through the state space given the observed output:
startprob = np.zeros(64)
startprob[positionlist['x'].ix*8+positionlist['y'].ix] = 1

model = hmm.MultinomialHMM(n_components=64, transmat=transmat, startprob=startprob)
model.n_symbols_ = 8
model.emissionprob_ = emissionprob

predictions = model.predict(positionlist['x'].tolist())

We can also compute the most likely state at each time in the model, using the forward-backward algorithm:

proba = model.predict_proba(positionlist['x'].tolist())

Here we plot the predicted and actual y values for each step in the simulation (ignoring x - we already know that). We also plot the probability of each y value by color.
We see that the prediction of the most likely path doesn't always correspond to the actual path - and this is to be expected, as the system has a random element, and is perfectly capable of choosing other values.

How good are these predictions? As each y value has a likelihood associated with it, we can group those likelihoods into bins, and see what fraction of the predictions in that bin were true. Here we look at 200,000 steps, for a total of 1,600,000 predictions grouped into bins of 2% confidence:
We see that our predictions are fairly well calibrated, as their measured accuracy tracks well with their expected accuracy. There is a bit more noise near the upper end of the chart, as there are fewer predictions made with high confidence. In the chart below, we see how frequent each of these predictions is in this enormous test set:
It would be interesting in the future to see how accurate we could make our predictions if we allowed our algorithm to peek at the y values every once in a while. We might come up with some trade between accuracy and inspection effort that would have lessons for other applications.