Problem
During the training process of my continuous observation sequence data using HMM with GMM mixtures, the cost function reduces gradually and it becomes NaN after some iterations.
Background of my data
I have 2 list say St & Rt
. Length of my list len(St) = 200 & len(Rt) = 100
Each element in list is numpy array of size 100*5
. Each list contains vehicle driving data which perform some maneuvers, each.
I have attached a picture below of my data set (i.e St[0] single element in list St which is an numpy nd array of size 100*5) & also the problem picture
I tried to train my first list which contains list of continuous data to get the parameters of the model.
I am giving 5 hidden states & 3 gaussian mixture as an input into model
.
I have been calculating log likelihood for every sequence i.e St[0], St1,.....
and finally I am summing up to get final cost value
When i start the training, it goes well for 5 - 8 iterations, then it chnages to NaN.
Question
1) What will be the reason for NaN occurrence ?
2) Is there is any pre processing step to be carried out in my data set before providing an input into the model ??
I am new in learning HMM-GMM modelling.
Kindly shed some light in this area with any external sources or links. Pictures (Problem & Training Data)
Additional Questions
Note : Based on to provide additional Information to the people in comment I made this Additional Questions column & edited my question.
For E.g, A list contains normalized training data.
Total Elements inside the list = 75
Each element inside the list is an np array.
The data inside the list is Vehicle driving data which is continuous.
'X_train = [[100*4], [100*4], [100*4], [100*4].........................[100*4]]'
len(X_train) = 75
X_Train[0] = [100*4]
X_Train.columns=['Veh.Speed','Strg Angle', 'Lat_Acceleration', 'Long_Acceleration']
Note:
Every [100*4]
data received from vehicle at specific Time intervals.
Lets say X_train[0] is 15 to 30 seconds driving study data
.
X_train[1] may be 15 to 30 seconds
driving study data and so on .....
Clarification needed related to Hidden Markov Model Training with Gaussian Mixture :
First I will explain the steps I followed and begin my clarification points.
- Selected 3 Hidden States and 2 Gaussian Mixture
- Initialized the parameters :
initial state(pi), trans_matrix(A), respons_gaussian(R), Mean (mu), Covariance (sigma) as diag covariance.
- Find out the emission_probability (B) with the help of above initialized parameters.
- using Forward Algorithm, i find out the probability of all elements in X_train and store it in an array.
i.e arr = np.array([(P(X_train[0]|λ), (P(X_train[1]|λ), P(X_train[2]|λ),....... P(X_train[75]|λ))
- Now calculated the log of all elements inside the above array and sum the whole array and defined cost.
i.e cost = log(arr).sum
- Update the HMM & Mixture parameters through
forward/Backward algo & Gamma variable
- Repeat the steps
Now I will start my confusion points while I perform training operation
Problem Faced When I print my cost function, Till 200 - 300 iteration my cost function reduces gradually and it becomes NAN around a value of 8950.
What I tried to avoid NAN
I believe that the problem will be in learning rate, so i multiply my learning rate for every 75 iterations with 0.1
, so that I update my new learning rate which is smaller
.
But after it comes to that value of around 8900 to 9000
it becomes NAN
once again
My Questions
Why it becomes NAN after several iterations
Whether the cost function value will converge to a local/global optima like a gradient descent ?
Since I want to perform
Forward Algorithm
after training, with the use ofX_test Data
, whether I can note down the updated parameters (pi, trans_mat, Gaussianm_mix matrix, mean, Covariance
)before NAN occurs and test the probability
?Whether it will produce good results or it is wrong to do that ?
What are the other ways to make my cost function converge ?
In what ways can i improve the training based on the history of my work ? If I missed something wrong, please let me know.