7

I'm trying to forecast a time series: given 50 previous values, I want to predict the 5 next values.

To do so, I'm using the skflow package (based on TensorFlow), and this problem is relatively close to the Boston example provided in the Github repo.

My code is as follows :

%matplotlib inline
import pandas as pd

import skflow
from sklearn import cross_validation, metrics
from sklearn import preprocessing

filepath = 'CSV/FILE.csv'
ts = pd.Series.from_csv(filepath)

nprev = 50
deltasuiv = 5

def load_data(data, n_prev = nprev, delta_suiv=deltasuiv):  

    docX, docY = [], []
    for i in range(len(data)-n_prev-delta_suiv):
        docX.append(np.array(data[i:i+n_prev]))
        docY.append(np.array(data[i+n_prev:i+n_prev+delta_suiv]))
    alsX = np.array(docX)
    alsY = np.array(docY)

    return alsX, alsY

X, y = load_data(ts.values) 
# Scale data to 0 mean and unit std dev.
scaler = preprocessing.StandardScaler()
X = scaler.fit_transform(X)
X_train, X_test, y_train, y_test = cross_validation.train_test_split(X, y,
    test_size=0.2, random_state=42)
regressor = skflow.TensorFlowDNNRegressor(hidden_units=[30, 50],
    steps=5000, learning_rate=0.1, batch_size=1)
regressor.fit(X_train, y_train)
score = metrics.mean_squared_error(regressor.predict(X_test), y_test)
print('MSE: {0:f}'.format(score))

This leads to :

ValueError: y_true and y_pred have different number of output (1!=5)

at the end of the training.

And when I try to predict, I have the same kind of problem

ypred = regressor.predict(X_test)
print ypred.shape, y_test.shape

(200, 1) (200, 5)

We can therefore see that the model is somehow predicting only 1 value instead of the 5 wanted/hoped.

How could I use the same model to predict values for several values ?

dga
  • 21,757
  • 3
  • 44
  • 51
Julian
  • 556
  • 1
  • 8
  • 27

1 Answers1

6

I've just added support for multi-output regression into skflow since this #e443c734, so please reinstall the package are try again. If it doesn't work, please follow up on Github.

I also added an example of multioutput regression to examples folder:

# Create random dataset.
rng = np.random.RandomState(1)
X = np.sort(200 * rng.rand(100, 1) - 100, axis=0)
y = np.array([np.pi * np.sin(X).ravel(), np.pi * np.cos(X).ravel()]).T

# Fit regression DNN model.
regressor = skflow.TensorFlowDNNRegressor(hidden_units=[5, 5])
regressor.fit(X, y)
score = mean_squared_error(regressor.predict(X), y)
print("Mean Squared Error: {0:f}".format(score))
twiz
  • 9,041
  • 8
  • 52
  • 84
ilblackdragon
  • 1,834
  • 12
  • 12
  • 1
    I don't think this code works anymore. When `fit()` is called, the following error is thrown: `Shapes (?, 1) and (?, 2) are incompatible`. (I'm using version `0.10.0rc0`) I also created a new question about this problem: http://stackoverflow.com/questions/39192107/multiple-target-columns-with-skflow-tensorflowdnnregressor – twiz Sep 03 '16 at 23:46
  • 1
    See this answer http://stackoverflow.com/questions/39935394/multiple-regression-output-nodes-in-tensorflow-learn/40164742 – drenerbas Oct 21 '16 at 14:27
  • Your examples folder 404's – JeffHeaton Dec 11 '16 at 03:17
  • I believe this is a bug in Learn... submitted: https://github.com/tensorflow/tensorflow/issues/6849 – JeffHeaton Jan 14 '17 at 14:56