0

I'm trying to create a new dataset of hidden state probabilities using a hidden Markov model. Everything works fine unless each time the output dataset comes up with different values (sometimes the same values) for hidden_states_train and hidden_states_test hence resulting a different column sizes in the columns stack/ a feature mismatch. e.g New dataset size (15261, 197) (5087, 194), New dataset size (15261, 197) (5087, 197) etc.

I can't figure out why this is happening each time I run the code. I tried to give same number of samples for both X_train_st and X_test_st but this keeps happening. If I set n_comp in range a smaller range e.g for n_comp in range(1,6) then often it results the same shapes.

Can someone shed some light to what's going on and a possible fix, please?

newX = X_train_st
newXtest = X_test_st

for n_comp in range(1,16):
    print("fitting to HMM and decoding %d ..." % n_comp , end="")
    modelHMM = GaussianHMM(n_components=n_comp, covariance_type="diag").fit(X_train_st)

    hidden_states_train = to_categorical(modelHMM.predict(X_train_st))
    hidden_states_test = to_categorical(modelHMM.predict(X_test_st))
    
    print("done")
    newX = np.column_stack((newX,hidden_states_train))
    newXtest = np.column_stack((newXtest,hidden_states_test))
    
print('New dataset size',newX.shape,newXtest.shape)
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Sarah M
  • 11
  • 2
  • Can anyone help, please? Hi, marc_s thanks. Why isn't anyone viewing this question, hence no answers.. – Sarah M Dec 26 '22 at 07:57

0 Answers0