Hidden Markov Model Training for Dynamic Gestures?

Question

I know there is a lot of material related to hidden markov model and I have also read all the questions and answers related to this topic. I understand how it works and how it can be trained, however I am not able to solve the following problem I am having when trying to train it for a simple dynamic gesture.

I am using HMM implementation for OpenCV I have looked into previously asked questions and answer here. Which has really helped me in understanding and using markov models.

I have total of two dynamic gestures, which are both symmetric (swipe left and swipe right) There are total of 5 observations in which 4 are the different stages in the gesture and 5th one is an observation when non of these stages are occuring.

Swipe left gesture consists of the following observation: 1->2->3->4 (which should trigger a swipe left state) Likewise Swipe Right gesture consists of the following observation: 4->3->2->1

I have 25 sequences. I am taking 20 observations for each of the sequence, which are used to train hidden markov model using Baum-Welch algorithm.

The following is the input sequence:

1 0 1 1 0 2 2 2 2 0 0 2 3 3 3 0 0 4 4 4 
4 4 4 4 4 0 3 3 3 3 3 0 0 1 0 0 1 1 0 1 
4 4 4 4 4 4 0 3 3 3 3 3 0 0 1 0 0 1 1 0 
4 4 4 4 4 4 4 0 3 3 3 3 3 0 0 1 0 0 1 1 
1 1 1 1 0 2 2 2 0 1 0 3 3 0 0 0 4 4 4 4 
1 1 1 1 1 0 2 2 2 0 1 0 3 3 0 0 0 4 4 4 
0 1 1 1 1 1 0 2 2 2 0 1 0 3 3 0 0 0 4 4 
0 0 1 1 1 1 1 0 2 2 2 0 1 0 3 3 0 0 0 4 
4 4 0 0 3 0 3 3 3 3 0 0 0 0 0 1 1 1 1 1 
4 4 4 0 0 3 0 3 3 3 3 0 0 0 0 0 1 1 1 1 
4 4 4 4 0 0 3 0 3 3 3 3 0 0 0 0 0 1 1 1 
1 1 1 1 0 0 2 2 0 3 2 3 3 3 0 0 4 4 4 4 
1 1 1 1 1 0 0 2 2 0 3 2 3 3 3 0 0 4 4 4 
1 1 1 1 1 1 0 0 2 2 0 3 2 3 3 3 0 0 4 4 
1 3 4 4 4 0 3 0 0 0 0 0 3 2 0 0 1 1 1 1

In this sequence you can see pattern for Swipe left and Swipe right gestures.

To train the hidden markov model I am initilizing it with the following values and then calling the train function to get the output:

TRANS:
0.7 0.15 0.15
0.3 0.4 0.3
0.3 0.4 0.3

EMIS:
0.3 0.1 0.1 0.1 0.1
0.2 0.1 0.2 0.2 0.3
0.2 0.3 0.2 0.2 0.1

INIT:
0.6 0.2 0.2

After training the output is:

TRANS:
0.81611 0.0847926 0.0990979
0.398458 0.346433 0.255109
0.371391 0.35587 0.272739

EMIS:
0.534127 0.125568 0.0824495 0.200169 0.0576869
0.294653 0.0250053 0.0500311 0.200616 0.429694
0.238808 0.075001 0.0500019 0.130455 0.505733

INIT:
0.443984 0.391323 0.164693

Using this model in my recognition program, I am not getting results. I want the system to remain in a NULL STATE unless one of the gesture is detected. In the Transition and Emission matrix I gave my guess values for both these gesture.

What do you think I might be doing wrong? Any pointers or help?

Lastly here is the code I am using for doing this (if anyone wants to have a look)

double TRGUESSdata[] = {0.7, 0.15, 0.15,
                            0.3, 0.4, 0.3,
                            0.3, 0.4, 0.3};
    cv::Mat TRGUESS = cv::Mat(3,3,CV_64F,TRGUESSdata).clone();
    double EMITGUESSdata[] = {0.3, 0.1, 0.1, 0.1, 0.1,
                              0.2, 0.1, 0.2, 0.2, 0.3,
                              0.2, 0.3, 0.2, 0.2, 0.1};
    cv::Mat EMITGUESS = cv::Mat(3,5,CV_64F,EMITGUESSdata).clone();
    double INITGUESSdata[] = {0.6 , 0.2 , 0.2};
    cv::Mat INITGUESS = cv::Mat(1,3,CV_64F,INITGUESSdata).clone();
    std::cout << seq.rows << " "  << seq.cols << std::endl;
    int a = 0;
    std::ifstream fin;
    fin.open("observations.txt");

    for(int y =0; y < seq.rows; y++)
    {
        for(int x = 0; x<seq.cols ; x++)
        {

            fin >> a;
            seq.at<signed int>(y,x) = (signed int)a;
            std::cout << a;
        }
        std::cout << std::endl;
    }

     hmm.printModel(TRGUESS,EMITGUESS,INITGUESS);
    hmm.train(seq,1000,TRGUESS,EMITGUESS,INITGUESS);
    hmm.printModel(TRGUESS,EMITGUESS,INITGUESS);

Here fin is used to read the observation I have from my other code.

count0 · Accepted Answer · 2012-08-22T13:18:15.607

3

What does the 0 mean in your model ? It seems to me in your data there are no direct transitions for both states, it always goes back to the state 0. Try something like the following in your data for a state transition sequence.

1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 4
1 2 3 4 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 1 1 2 2 3 3 4 4 0 0 0 0 0
4 4 3 3 2 2 1 1 0 0 0 0 0 0 0 0 0
4 4 4 3 3 3 2 2 2 2 2 1 1 1 1 1 1

As a general rule:

I would recommend to work with openCV only after you have a proof of concept in Matlab/octave. This has two reasons. First of all you know exactly what you want to do and how it works, and don't waste your time implementing and debugging your theory in a 'low' level language (compared to matlab). Debugging algorithms in openCV is really time-consuming.

Secondly after you know your stuff works as expected, if you implement it and hit a bug (of openCV or C++, python) you know it's not your theory, not your implementation, it's the framework. It happened to me already two times that employed computer scientists implemented directly from a paper (after being told not to do so), spending 80% of the remaining time to debug the algorithm without ANY success only to find out that: they didn't really get the theory or some submodule of openCV had a slight bug which degenerated their results.

The link you've mentioned uses a HMM toolbox in matlab. Try to implement and understand your problem there, it's really worth spending the time. Not only you can verify each step for correctness, you can use the itermediate matrices with your openCV code after you have a working model.

edited Aug 22 '12 at 13:18

answered Aug 22 '12 at 12:36

count0

2,537
2
22
30

Thank you for your answer. I have tried working in matlab as well. however the problem is not the implementation, I am not even trying to implement it on my own. Just using some already built libraries. The thing i dont understand is that both gestures are symmetric, how can i train gesture which are symmetric? Meaning 1,2,3,4 would give the same probability dist as 4,3,2,1 if the observations contain this. This is the reason why after training the HMM is optimized for only one gesture. Can you give me any pointers on how to make this work? – masad Aug 22 '12 at 19:22
0 means a NULL observation, which is detected by my other code that uses neural networks to identify each stage. I can modify it to not output 0 and only categorize each pose into either of the values 1,2,3,4, giving the closest matching pose. But it is required to detect the NULL State in the HMM – masad Aug 22 '12 at 19:25
1

You should not report the NULL state then. HMM's are being used in combination, so basically you'll have 3+ models. One for the 1-2-3-4 transition. One for the 4-3-2-1 transition. And one/many modelling all other transitions you want to mark as wrong. Then, you'll always have one model which matches a gesture sequence with the highest probability. (e.g. you make models for 1-3-2-4, 1-2-4-3, 1-4-3-2 modelling wrong states). The beauty of HMM's is that you can model a sequence with repetitions so a sequence like 1-1-1-2-3-3-3-3-3-4 will match the 1-2-3-4 model and not the 1-4-2-3 one. – count0 Aug 23 '12 at 13:22
1

There are many other possibilities to group HMM's and model the states you don't want to recognize. Take a look at speech recognition literature. – count0 Aug 23 '12 at 13:24
Thank you so much. This worked out. I understand the training now. I only want to ask you one last thing, is it possible to model all other states which my system should recognize as NULL into ONE HMM? I have trained two models for each gestures. Now I am planning to train the third one for the rest of the Observations – masad Aug 30 '12 at 21:33
1

All other states can be modeled as one big HMM, but it will be a more complex one. It would have to catch multiple transition possibilities at once which means it will need multiple branches. I think usually one uses more something like a HMM of HMMs approach. – count0 Aug 31 '12 at 13:10

Hidden Markov Model Training for Dynamic Gestures?

1 Answers1

Linked