Issue in training hidden markov model and usage for classification

Question

I am having a tough time in figuring out how to use Kevin Murphy's HMM toolbox Toolbox. It would be a great help if anyone who has an experience with it could clarify some conceptual questions. I have somehow understood the theory behind HMM but it's confusing how to actually implement it and mention all the parameter setting.

There are 2 classes so we need 2 HMMs.
Let say the training vectors are :class1 O1={ 4 3 5 1 2} and class O_2={ 1 4 3 2 4}.
Now,the system has to classify an unknown sequence O3={1 3 2 4 4} as either class1 or class2.

What is going to go in obsmat0 and obsmat1?
How to specify/syntax for the transition probability transmat0 and transmat1?
what is the variable data going to be in this case?
Would number of states Q=5 since there are five unique numbers/symbols used?
Number of output symbols=5 ?
How do I mention the transition probabilities transmat0 and transmat1?

score 38 · Accepted Answer · answered Mar 18 '12 at 00:10

Instead of answering each individual question, let me illustrate how to use the HMM toolbox with an example -- the weather example which is usually used when introducing hidden markov models.

Basically the states of the model are the three possible types of weather: sunny, rainy and foggy. At any given day, we assume the weather can be only one of these values. Thus the set of HMM states are:

S = {sunny, rainy, foggy}

However in this example, we can't observe the weather directly (apparently we are locked in the basement!). Instead the only evidence we have is whether the person who checks on you every day is carrying an umbrella or not. In HMM terminology, these are the discrete observations:

x = {umbrella, no umbrella}

The HMM model is characterized by three things:

The prior probabilities: vector of probabilities of being in the first state of a sequence.
The transition prob: matrix describing the probabilities of going from one state of weather to another.
The emission prob: matrix describing the probabilities of observing an output (umbrella or not) given a state (weather).

Next we are either given the these probabilities, or we have to learn them from a training set. Once that's done, we can do reasoning like computing likelihood of an observation sequence with respect to an HMM model (or a bunch of models, and pick the most likely one)...

1) known model parameters

Here is a sample code that shows how to fill existing probabilities to build the model:

Q = 3;    %# number of states (sun,rain,fog)
O = 2;    %# number of discrete observations (umbrella, no umbrella)

%#  prior probabilities
prior = [1 0 0];

%# state transition matrix (1: sun, 2: rain, 3:fog)
A = [0.8 0.05 0.15; 0.2 0.6 0.2; 0.2 0.3 0.5];

%# observation emission matrix (1: umbrella, 2: no umbrella)
B = [0.1 0.9; 0.8 0.2; 0.3 0.7];

Then we can sample a bunch of sequences from this model:

num = 20;           %# 20 sequences
T = 10;             %# each of length 10 (days)
[seqs,states] = dhmm_sample(prior, A, B, num, T);

for example, the 5th example was:

>> seqs(5,:)        %# observation sequence
ans =
     2     2     1     2     1     1     1     2     2     2

>> states(5,:)      %# hidden states sequence
ans =
     1     1     1     3     2     2     2     1     1     1

we can evaluate the log-likelihood of the sequence:

dhmm_logprob(seqs(5,:), prior, A, B)

dhmm_logprob_path(prior, A, B, states(5,:))

or compute the Viterbi path (most probable state sequence):

vPath = viterbi_path(prior, A, multinomial_prob(seqs(5,:),B))

5th_example

2) unknown model parameters

Training is performed using the EM algorithm, and is best done with a set of observation sequences.

Continuing on the same example, we can use the generated data above to train a new model and compare it to the original:

%# we start with a randomly initialized model
prior_hat = normalise(rand(Q,1));
A_hat = mk_stochastic(rand(Q,Q));
B_hat = mk_stochastic(rand(Q,O));  

%# learn from data by performing many iterations of EM
[LL,prior_hat,A_hat,B_hat] = dhmm_em(seqs, prior_hat,A_hat,B_hat, 'max_iter',50);

%# plot learning curve
plot(LL), xlabel('iterations'), ylabel('log likelihood'), grid on

log_likelihood

Keep in mind that the states order don't have to match. That's why we need to permute the states before comparing the two models. In this example, the trained model looks close to the original one:

>> p = [2 3 1];              %# states permutation

>> prior, prior_hat(p)
prior =
     1     0     0
ans =
      0.97401
  7.5499e-005
      0.02591

>> A, A_hat(p,p)
A =
          0.8         0.05         0.15
          0.2          0.6          0.2
          0.2          0.3          0.5
ans =
      0.75967      0.05898      0.18135
     0.037482      0.77118      0.19134
      0.22003      0.53381      0.24616

>> B, B_hat(p,[1 2])
B =
          0.1          0.9
          0.8          0.2
          0.3          0.7
ans =
      0.11237      0.88763
      0.72839      0.27161
      0.25889      0.74111

There are more things you can do with hidden markov models such as classification or pattern recognition. You would have different sets of obervation sequences belonging to different classes. You start by training a model for each set. Then given a new observation sequence, you could classify it by computing its likelihood with respect to each model, and predict the model with the highest log-likelihood.

argmax[ log P(X|model_i) ] over all model_i

Thank you immensely for this valuable information. However, certain things are still unclear with respect to applying this example to my problem.If you could kindly hint as to what shall be the observation O in my case(for umbrella and no umbrella cases sound too predictable and what if the decision is not binary like the umbrella one) and what is the permute matrix p? — George Roy, Mar 18 '12 at 05:56
@Amro I am a bit surprised because it seeems that the sequences used for training have to come stacked on top of each other in a matrix. That means they all need to have the same length, doesn't it? Isn't that an unnecessary condition? — Konstantin Schubert, Mar 28 '13 at 12:43
@Konstantin: the train function also accepts a cell array if the sequences are of different length (`seqs{i}`), for example you can use: `num2cell(seqs,2)` — Amro, Mar 29 '13 at 00:29
@Amro could you explain please why `dhmm_em` does not take `states` as an argument when learning model? — medvedNick, Apr 03 '13 at 20:58
@medvedNick: `dhmm_em` implements the [Baum-Welch algorithm](http://en.wikipedia.org/wiki/Baum%E2%80%93Welch_algorithm) to learn the HMM model when given only the emission data. If you know both the observable sequences as well as the corresponding hidden states, then apply straightforward counting to estimate the model parameters (count how many times each of the symbols is emitted from each state, and how many times you transition from one state to another, then normalize the counts to get proper probabilities) — Amro, Apr 03 '13 at 21:15
@Amro so if I want to use `mhmm_em` with gaussian mixtures, after learning I need to take matrices of `mu` and `sigma` from output and estimate `B_hat` (transition matrix) by myself as you say, right? — medvedNick, Apr 03 '13 at 21:32
@medvedNick: I'm not sure what you mean; both versions of the train function will estimate all the model parameters (emission and transition) given only the observed data. So with `mhmm_em` you automatically get the prior probabilities, transition matrix, as well as the means, shared covariance, and coefficients of the gaussian mixtures. — Amro, Apr 03 '13 at 22:10
@Amro you've explained how to estimate parameters taking into account `states` for discrete case, and I was trying to understand the algorithm for the mixture case.. Estimation of transition matrix will be the same, but for other parameters it will be different. My last comment was like: "Can I take some ready output of `mhmm_em` for estimating these params or should I somehow implement it by myself?". But now I see that output of EM will not be matching my states anyway.. — medvedNick, Apr 03 '13 at 22:52
Please, i try to test the example but idon't find the function multinomial_prob(seqs(5,:) I have downloaded HMM toolbox from this url http://www.cs.ubc.ca/~murphyk/Software/HMM/hmm_download.html Have you an idea please ? — researcher, May 26 '13 at 15:10
@researcher: I can see it inside the KPMstats folder. Did you follow the instructions on that download page to add the toolbox to the MATLAB path: `addpath(genpath('C:/HMMall'))`? — Amro, May 26 '13 at 15:22
Thank you so much for help, it works now :) Please how can i generate the first graph (figure1) ? Thank you :) — researcher, May 26 '13 at 15:35
@researcher: I've simply used the `stairs` function to plot the sequences, then customized the labels and axis ticks. This is basic MATLAB plotting.. Consult the docs if you need help on getting started with plotting in MATLAB. — Amro, May 26 '13 at 16:13
Thank you so much for help, stairs(t,O), xlabel('Time'), ylabel('Symbols'), grid on that's it ? please just guide me i have not the same figure as you ;( — researcher, May 26 '13 at 16:47
@researcher: again this is not related to HMM. You are asking about MATLAB plotting functionality, which is off-topic in this case.. If you need help with that, create a new question of your own and ask how to customize axis tick labels. — Amro, May 26 '13 at 17:37
@Amro pls help me in this, http://stackoverflow.com/questions/23654578/hidden-markov-model-classifying-a-sequence-in-matalb — , May 18 '14 at 17:29
I know this is old, but could you, please, explain the invocation of viterbi_path (why can't I give it the observed sequence and get back the most likely hidden state transition path) — Alex Kreimer, Jan 14 '17 at 07:51
@AlexKreimer that's exactly what it does, it's just done in two steps. If you like you can write a wrapper function for easier invocation: http://pastebin.com/P1c9z6hp. See the [docs](http://www.cs.ubc.ca/~murphyk/Software/HMM/hmm_usage.html) for reference (scroll down to the section "Computing the most probable sequence (Viterbi)"). Also https://en.wikipedia.org/wiki/Viterbi_algorithm for more info — Amro, Jan 14 '17 at 10:06

learnvst · Answer 2 · 2012-03-16T18:52:25.257

I do not use the toolbox that you mention, but I do use HTK. There is a book that describes the function of HTK very clearly, available for free

http://htk.eng.cam.ac.uk/docs/docs.shtml

The introductory chapters might help you understanding.

I can have a quick attempt at answering #4 on your list. . . The number of emitting states is linked to the length and complexity of your feature vectors. However, it certainly does not have to equal the length of the array of feature vectors, as each emitting state can have a transition probability of going back into itself or even back to a previous state depending on the architecture. I'm also not sure if the value that you give includes the non-emitting states at the start and the end of the hmm, but these need to be considered also. Choosing the number of states often comes down to trial and error.

Good luck!

Thank you for the response.However,HTK toolbox is even complex than this one!Also,when considering the number of states do we include those states where there is a self loop and a transition back and forth like the one in Ergodic HMM model?I am not aware of this concept.If you could explain with an example for different cases of states then it would be immensely of help. — George Roy, Mar 16 '12 at 17:05

Issue in training hidden markov model and usage for classification

2 Answers2

1) known model parameters

2) unknown model parameters

Linked