3

I have been attempting to use the hmmlearn package in python to build a model predicting values of a time series. I have based my code on this article, detailing how to use the package for a stock price time series.

After fitting the model on a large segment of the time series data and attempting to build a predictive model for the remainder, I run into an issue. The model always predicts the same outcome as being most probable - hmm.score returns the highest log-likelihood for the same outcome for every instance in the test series. Moreover, the outcome it predicts is the one closest to the mean value of the time series it was fitted on. It never deviates. I'm really not sure what to do. Is the model deficient, or am I doing something wrong?

The code that does the prediction is below. It appends all of the possible_outcomes (defined immediately below) to a sequence of test points in the time series (the last 100 in the test dataset) and evaluates the likelihood (using hmm.score):


possible_outcomes = np.linspace(-0.1, 0.1, 10)

latency_days = 10

def predict_close_price(time_index):
    open_price = actuals_test[time_index]
    predicted_frac_change = get_most_probable_outcome(time_index)
    return open_price * (1 + predicted_frac_change)


def get_most_probable_outcome(time_index):
    previous_data_start_index = max(0, time_index - latency_days)
    previous_data_end_index = max(0, time_index - 1)
    prev_start = int(previous_data_start_index)
    prev_end = int(previous_data_end_index)
    previous_data = test_data[prev_start: prev_end]

    outcome_score = []
    for possible_outcome in possible_outcomes:
        total_data = np.row_stack((previous_data, possible_outcome))
        outcome_score.append(hmm.score(total_data))
    most_probable_outcome = possible_outcomes[np.argmax(outcome_score)]
    print(most_probable_outcome)
    return most_probable_outcome

predicted_close_prices = []
actuals_vector = []
for time_index in range(len(actuals_test)-100,len(actuals_test)-1):
    predicted_close_prices.append(predict_close_price(time_index))
    actuals_vector.append(actuals_test[(time_index)])

I don't know if the issue is with the above, or with the actual creation of data and fitting of the model itself. That is done simplistically as follows:

timeSeries.reverse()

difference_fracs = []

for i in range(0, len(timeSeries)-1):
    difference_frac = ((timeSeries[i+1] - timeSeries[i])/(timeSeries[i]))
    difference_fracs.append(difference_frac)

differences_array = np.array(difference_fracs)
differences_array = np.reshape(differences_array, (-1,1))

train_data_length = 2000

train_data = differences_array[:train_data_length,:]
test_data = differences_array[train_data_length:len(timeSeries),:]
actuals_test = timeSeries[train_data_length:]

n_hidden_states = 4

hmm = GaussianHMM(n_components = n_hidden_states)
hmm.fit(trainData)

I realize most of this is meaningless without the actual time series, which I am not allowed to share - though if someone has had similar issues in the past, I would love to hear your thoughts.

0 Answers0