4

I'm having trouble implementing a HMM model. I'm starting with a pandas dataframe where I want to use two columns to predict the hidden state. I'm using the hmmlearn package.

I'm following the instructions for hmmlearn 'Working with multiple sequences" https://hmmlearn.readthedocs.io/en/latest/tutorial.html#multiple-sequences

I followed the code below but set X1 and X2 as my columns

X1 = [[0.5], [1.0], [-1.0], [0.42], [0.24]]
X2 = [[2.4], [4.2], [0.5], [-0.24], [0.24]]
X = np.concatenate([X1, X2])
lengths = [len(X1), len(X2)]
hmm.GaussianHMM(n_components=3).fit(X, lengths)
predictions=model.predict(X)

The problem is that when I try to predict the state instead of combining the sequences to create 1 prediction, i'm getting a prediction for each observations. So in this example I want 5 observations but I'm getting 10. Is there a way to incorporate the features of a dataframe as independent variables to get 1 combined prediction?

1 Answers1

3

Currently you are giving the model two sequences of samples, each sequence having 5 observations with only one feature - so 10 observations in total. What you want is to have a single sequence with 5 observations, with two features.

Would be something like that:

X = [[0.5, 2.4], [1.0, 4.2], [-1.0, 0.5], [0.42, -0.24], [0.24, 0.25]]
lengths = [len(X)]
model = hmm.GaussianHMM(n_components=3).fit(X, lengths)
predictions = model.predict(X)

Then you will have only five predictions.

If you want to build the X variable using X1 and X2like in your example, you can do it using zip:

X = [[x1[0], x2[0]] for x1, x2 in zip(X1, X2)]
mac13k
  • 2,423
  • 23
  • 34
Jundiaius
  • 6,214
  • 3
  • 30
  • 43
  • 1
    Thank you! That makes sense. I tried the code and i'm still getting a shape error. ```X = [[5, 4], [1, 2], [1, 5], [42, 24], [24, 25]]``` ```lengths = [len(X)]``` ```hmm.GaussianHMM(n_components=3).fit(X, lengths)``` ```predictions=model.predict(X)``` ```ValueError: could not broadcast input array from shape (10) into shape (5)``` Do you know why this is occuring? – Christine Cao Feb 22 '21 at 00:23
  • This code works, but results in a warning: ```Fitting a model with 20 free scalar parameters with only 10 data points will result in a degenerate solution.```, so it seems like multi-feature series are not supported. – mac13k Aug 16 '21 at 11:09