I'm trying to create a new dataset of hidden state probabilities using a hidden Markov model. Everything works fine unless each time the output dataset comes up with different values (sometimes the same values) for hidden_states_train and hidden_states_test hence resulting a different column sizes in the columns stack/ a feature mismatch. e.g New dataset size (15261, 197) (5087, 194), New dataset size (15261, 197) (5087, 197) etc.
I can't figure out why this is happening each time I run the code. I tried to give same number of samples for both X_train_st and X_test_st but this keeps happening. If I set n_comp in range a smaller range e.g for n_comp in range(1,6) then often it results the same shapes.
Can someone shed some light to what's going on and a possible fix, please?
newX = X_train_st
newXtest = X_test_st
for n_comp in range(1,16):
print("fitting to HMM and decoding %d ..." % n_comp , end="")
modelHMM = GaussianHMM(n_components=n_comp, covariance_type="diag").fit(X_train_st)
hidden_states_train = to_categorical(modelHMM.predict(X_train_st))
hidden_states_test = to_categorical(modelHMM.predict(X_test_st))
print("done")
newX = np.column_stack((newX,hidden_states_train))
newXtest = np.column_stack((newXtest,hidden_states_test))
print('New dataset size',newX.shape,newXtest.shape)