Training Hidden Markov model with GMM, nan appears after some iterations in python

Question

Problem

During the training process of my continuous observation sequence data using HMM with GMM mixtures, the cost function reduces gradually and it becomes NaN after some iterations.

Background of my data

I have 2 list say St & Rt. Length of my list len(St) = 200 & len(Rt) = 100 Each element in list is numpy array of size 100*5. Each list contains vehicle driving data which perform some maneuvers, each.

I have attached a picture below of my data set (i.e St[0] single element in list St which is an numpy nd array of size 100*5) & also the problem picture

I tried to train my first list which contains list of continuous data to get the parameters of the model.

I am giving 5 hidden states & 3 gaussian mixture as an input into model.

I have been calculating log likelihood for every sequence i.e St[0], St1,..... and finally I am summing up to get final cost value

When i start the training, it goes well for 5 - 8 iterations, then it chnages to NaN.

Question

1) What will be the reason for NaN occurrence ?

2) Is there is any pre processing step to be carried out in my data set before providing an input into the model ??

I am new in learning HMM-GMM modelling.

Kindly shed some light in this area with any external sources or links. Pictures (Problem & Training Data)

Additional Questions

Note : Based on to provide additional Information to the people in comment I made this Additional Questions column & edited my question.

For E.g, A list contains normalized training data.

Total Elements inside the list = 75

Each element inside the list is an np array.

The data inside the list is Vehicle driving data which is continuous.

'X_train = [[100*4], [100*4], [100*4], [100*4].........................[100*4]]' 
len(X_train) = 75
X_Train[0] = [100*4]
X_Train.columns=['Veh.Speed','Strg Angle', 'Lat_Acceleration', 'Long_Acceleration']

Note:

Every [100*4] data received from vehicle at specific Time intervals. Lets say X_train[0] is 15 to 30 seconds driving study data. X_train[1] may be 15 to 30 seconds driving study data and so on .....

Clarification needed related to Hidden Markov Model Training with Gaussian Mixture :

First I will explain the steps I followed and begin my clarification points.

Selected 3 Hidden States and 2 Gaussian Mixture
Initialized the parameters : initial state(pi), trans_matrix(A), respons_gaussian(R), Mean (mu), Covariance (sigma) as diag covariance.
Find out the emission_probability (B) with the help of above initialized parameters.
using Forward Algorithm, i find out the probability of all elements in X_train and store it in an array. i.e arr = np.array([(P(X_train[0]|λ), (P(X_train[1]|λ), P(X_train[2]|λ),....... P(X_train[75]|λ))
Now calculated the log of all elements inside the above array and sum the whole array and defined cost. i.e cost = log(arr).sum
Update the HMM & Mixture parameters through forward/Backward algo & Gamma variable
Repeat the steps

Now I will start my confusion points while I perform training operation

Problem Faced When I print my cost function, Till 200 - 300 iteration my cost function reduces gradually and it becomes NAN around a value of 8950.

What I tried to avoid NAN

I believe that the problem will be in learning rate, so i multiply my learning rate for every 75 iterations with 0.1, so that I update my new learning rate which is smaller. But after it comes to that value of around 8900 to 9000 it becomes NAN once again

My Questions

Why it becomes NAN after several iterations
Whether the cost function value will converge to a local/global optima like a gradient descent ?
Since I want to perform Forward Algorithm after training, with the use of X_test Data, whether I can note down the updated parameters (pi, trans_mat, Gaussianm_mix matrix, mean, Covariance) before NAN occurs and test the probability ?
Whether it will produce good results or it is wrong to do that ?
What are the other ways to make my cost function converge ?
In what ways can i improve the training based on the history of my work ? If I missed something wrong, please let me know.

Are your data normalized? This could be due to gradient explosion or vanishing (which can happen as the training is based on backpropagation). — Eskapp, Jun 04 '19 at 15:39
I have started training with normalized data and lower learning rate, so, i will let you know if i got any problem . Also, I have a doubt : does the initialization of the variables/parameters have an effect on producing NaN ? DO you have any idea ? — Mari, Jun 04 '19 at 19:54
In my experience, it could for some types of mixture models (Dirichlet-based models for instance), for the Gaussian mixture, it usually behaves pretty well in that respect. You can try initializing the parameters with realistic parameters and see if you still encounter issues. — Eskapp, Jun 04 '19 at 20:15
Still I have an issue with nan, After many iteration it decreases beautifully then it becomes nan. If you need any further information, i will do that. Kindly guide me. (Note :For this training i used only 3 Hidden states with 2 Gaussian @Eskapp — Mari, Jun 06 '19 at 09:33
In C, I often got NaN because I cannot fit big and multiple arrays into the memory RAM (there is also limit on the size of the stack that we can use). What's your hardware specs? — cho_uc, Jun 06 '19 at 10:42
@Eskapp I have a few questions related to HMM since I am not sure I am doing correctly or not. I request you to please take some time and read through the question which I have edited now. Please read from **Additional Questions** which is highlighted in the question above. I hope you can give a better solution, since I have been struck with this for almost 2 weeks. — Mari, Jun 06 '19 at 10:43
@cho_uc RAM - 8 GB, 64 bit. I have edited my questions(History of my work) since I want to know the whether I am doing correctly or not. If you find, if i miss something please let me know. Updated the question from **Additional Questions which is highlighted in the question above** — Mari, Jun 06 '19 at 10:47
To check if the problem is in the hardware or your algo: 1. Try to run it in different computer which has higher specs. 2. Use dummy data with lower size such as [10*4] instead of [100*4] and see if the problem persist — cho_uc, Jun 06 '19 at 11:08
But could you lease tell me whether it will get converge like a gradient descent algo - local/global optimum ? @cho_uc — Mari, Jun 06 '19 at 13:21
I doubt you need so many iterations for such a small data set and number of states. Have you tried stopping at 200 iterations and check the results? — Eskapp, Jun 06 '19 at 18:04
@Eskapp My aim, I have another data set, of list containing 75 elements same like above. So that i create 2 HMM models. Now I aim for calculating **forward algorithm** using a data set which I have not used in training belongs to one of the class using the 2 HMM model parameters. So I tried using the parameters what i got after 200 iterations (before NaN occurs) from both of the 2 HMM models and calculate the highest probability by using the test data set **(X_test[0] which is also a 100*4 numpy array)**. I can't get the correct result i.e which model the X_test[0] belongs. — Mari, Jun 06 '19 at 19:04
@cho_uc i tried running in google collab but the problem remains same — Mari, Jun 07 '19 at 19:38
In the docs : *Note, since the EM algorithm is a gradient-based optimization method, it will generally get stuck in local optima. You should in general try to run fit with various initializations and select the highest scored model.* Maybe you can try to play with various `init_params` in the model. [Link](https://github.com/hmmlearn/hmmlearn/issues/62) — cho_uc, Jun 08 '19 at 14:52
OK, I will check with different params. But could you confirm whether my input data should looks `Gaussian` before training ? Say for `E.g X = [[100*4],[100*4],[100*4]`,........[100*4]], In `X[0]` every column values should look like or be Gaussian ? @cho_uc — Mari, Jun 09 '19 at 07:02