1

how do I invert the stationarity and reapply the dates to the data for plotting?

srcs:

I am trying to invert stationarity and get a plot of prediction, particularly for two columns called ' app_1', and ' app_2, (the orange and red lines below).

The data I am drawing from looks like this: plotted data set

print(u1.info())
u1.head()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 15011 entries, 2017-08-28 11:00:00 to 2018-01-31 19:30:00
Freq: 15T
Data columns (total 10 columns):
 app_1        15011 non-null float64
 app_2        15011 non-null float64
user          15011 non-null object
 bar          15011 non-null float64
 grocers      15011 non-null float64
 home         15011 non-null float64
 lunch        15011 non-null float64
 park         15011 non-null float64
 relatives    15011 non-null float64
 work         15011 non-null float64
dtypes: float64(9), object(1)
memory usage: 1.3+ MB

app_1   app_2   user    bar grocers home    lunch   park    relatives   work
date                                        
2017-08-28 11:00:00 0.010000    0.0 user_1  0.0 0.0 0.0 0.0 0.0 0.0 0.0
2017-08-28 11:15:00 0.010125    0.0 user_1  0.0 0.0 0.0 0.0 0.0 0.0 0.0
2017-08-28 11:30:00 0.010250    0.0 user_1  0.0 0.0 0.0 0.0 0.0 0.0 0.0
2017-08-28 11:45:00 0.010375    0.0 user_1  0.0 0.0 0.0 0.0 0.0 0.0 0.0
2017-08-28 12:00:00 0.010500    0.0 user_1  0.0 0.0 0.0 0.0 0.0 0.0 0.0

the location column represent a location the user is at at a given time -- after the first "significant location change" event, one and only one column will be a 1 at a time.

I am analyzing this with VARIMAX -- using statsmodels VARMAX version of AR.:

from statsmodels.tsa.statespace.varmax import VARMAX
import pandas as pd
import numpy as np

%matplotlib inline

import matplotlib
import matplotlib.pyplot as plt

from random import random
#...

columns = [ ' app_1', ' app_2', ' bar', ' grocers', ' home', ' lunch', ' work', ' park', ' relatives' ]
series = u1[columns]

# from: https://machinelearningmastery.com/make-predictions-time-series-forecasting-python/
# create a difference transform of the dataset
def difference(dataset):
    diff = list()
    for i in range(1, len(dataset)):
        value = dataset[i] - dataset[i - 1]
        diff.append(value)
    return np.array(diff)

# Make a prediction give regression coefficients and lag obs
def predict(coef, history):
    yhat = coef[0]
    for i in range(1, len(coef)):
        yhat += coef[i] * history[-i]
    return yhat

X = pd.DataFrame()
for column in columns:
    X[column] = difference(series[column].values)

size = (4*24)*54 # hoping
train, test = X[0:size], X[size:size+(14*4*24)]

train = train.loc[:, (train != train.iloc[0]).any()] # https://stackoverflow.com/questions/20209600/panda-dataframe-remove-constant-column
test = test.loc[:, (test != test.iloc[0]).any()] # https://stackoverflow.com/questions/20209600/panda-dataframe-remove-constant-column

#print(train.var(), X.info())

# train autoregression
model = VARMAX(train)
model_fit = model.fit(method='powell', disp=False)
#print(model_fit.mle_retvals)

##window = model_fit.k_ar
coef = model_fit.params

# walk forward over time steps in test
history = [train.iloc[i] for i in range(len(train))]
predictions = list()
for t in range(len(test)):
    yhat = predict(coef, history)
    obs = test.iloc[t]
    predictions.append(yhat)
    history.append(obs) 

print(mean_squared_error(test, predictions))

0.5594208989876831

That mean_squared_error from scikitlearn is not horrifying (its roughly the middle of the three samples shown in the documentation, in fact). That _could mean that the data is predictive. I'd like to see that in a plot.

# plot
plt.plot(test)
plt.plot(predictions, color='red')
plt.show()

plot of predictions

So, part of what is going on here is that the data is seasonal, so it had to have stationarity applied to it. Now the lines are all vertical, instead of temporal.

But another thing that concerns me is the scale of the red data. That's a lot of red. Anyway, how do I invert the stationarity and reapply the dates to the data for plotting? It obviously should not look like that. :)

roberto tomás
  • 4,435
  • 5
  • 42
  • 71

1 Answers1

0

the way to do this turned out to be, first, make it into a dataframe:

predDf = pd.DataFrame(predictions)
roberto tomás
  • 4,435
  • 5
  • 42
  • 71