0

I am trying to use hmmlearn's GaussianHMM to fit a Hidden Markov Model with 2 main states, while allowing for multiple exogenous variables. My goal is to determine two states of GDP growth (one with low variance and the other with high variance), these states then depend on lagged unemployment, lagged commercial confidence level etc. I have a couple of questions:

  1. Using hmmlearn's GaussiansHMM, I have read through the documentation but I cannot find any mention of exogenous variable. Using the method fit(X, lengths=None), I see that X can have n_features columns, do I understand correctly that I should pass in an array with the first column being the endogenous varible (GDP growth in my case) and the rest of columns are the exogenous variables ?
  2. Is hmmlearn's GaussianHMM equivalent to statsmodels.tsa.regime_switching.markov_regression.MarkovRegression ? This model allows for exog_tvtp which means that exogenous variables are used to calculate a time varying transition probabilities matrix.

An example of fitting the monthly returns of the S&P500, no exogenous variable.

import numpy as np
import pandas as pd
from hmmlearn.hmm import GaussianHMM
import yfinance as yf
sp500 = yf.download("^GSPC")["Adj Close"]

# Fitting an absolute return model because we only care about volatility #
rets = np.log(sp500/sp500.shift(1)).dropna()
rets.index = pd.to_datetime(rets.index)
rets = rets.resample("M").sum()
model = GaussianHMM(n_components=2)
model.fit(rets.to_frame())
state_sequence = model.predict(rets.to_frame())

Imagine if I want to add a dependency on exogenous variables to the returns of the S&P500, for example on economic growth or past volatilities, is there a way to do this ? Thanks for any help.

Bach Pham
  • 59
  • 4
  • please let me know what you think of my answer. If you feel this is the appropriate answer, please accept it. – batlike Nov 11 '20 at 07:11

1 Answers1

0

n_features can be thought of as the temporal domain, and should not be conflated with features that describe the complexity of ie. a regression model.

  1. If your hidden states are the two states of GDP growth, then the observed variable (or emissions) that you are trying to infer the hidden states from should be the feature space (a.k.a. n_features).
  • This should be a single measurement (emission) descriptive of a combination of your "exogenous variables", collected over time. hmmlearn will not be able to take multivariate emissions.

Suggestions

  • If I understand your question correctly, perhaps what you might be looking for are Kalman filters. KF produces estimates of unknowns based on multiple measurements (ie. all of your exogenous variables) that ultimately produce a model more accurate than those based on a single measurement.
  • If you wish each hidden state to have multiple independent emissions then what you might be looking for is a structured perceptron. This is discussed here: Hidden Markov Model for multiple observed variables
batlike
  • 668
  • 1
  • 7
  • 19