1

Assume that I have five columns in my dataset (A,B,C,D,E) and I want to build an LSTM model by training just on A,B,D,E (i.e. I want to exclude C)

My problem is that I still want to use this model to predict C. Is it possible if I didn't train my model with this variable? How can I do that?

EDIT 1
I'm working with categorical and numerical data modeled as time series. In this specific case, C is a categorical time series (given in a one-hot representation).

Veltzer Doron
  • 934
  • 2
  • 10
  • 31
Alessandro
  • 742
  • 1
  • 10
  • 34
  • 1
    You should really explain in detail what each column means, how many sequences you have, what the sequence lenghts are..... – Daniel Möller Jan 31 '18 at 01:04
  • Nope, not in any read sense and only in a very roundabout manner, why don't you want your model to see samples of what you want it to predict? – Veltzer Doron Feb 05 '18 at 18:49
  • @VeltzerDoron because I my model repeats the previous values of the target signal, so I'm trying to obfuscate his values. Maybe it's not a good solutions, but I want to achieve something like this. – Alessandro Feb 05 '18 at 21:20
  • @Ghemon I'm not sure I follow you, do you mean it generalizes poorly? – Veltzer Doron Feb 05 '18 at 23:29
  • @VeltzerDoron yes, I think this is the issue. For this reason I decided to predict C only with the information brought from A,B,D,E. Am I clear? – Alessandro Feb 06 '18 at 08:57
  • Not sure, maybe if we continue discussing it it'll become clearer, Why don't you add dropout or other forms of variational noise instead? – Veltzer Doron Feb 06 '18 at 10:32
  • @VeltzerDoron I tried. Actually I opened a question for this problem: https://stackoverflow.com/questions/47618285/why-my-lstm-model-is-repeating-the-previous-values. Maybe you could find it interesting. – Alessandro Feb 06 '18 at 12:11

2 Answers2

3

Yes, you can! However, there needs to be a correlation between field C and the other columns. If not, then the predictions will be close to random.

  • Train the model using A,B,D,E as Input (x)
  • Make C to be the (y)

Divide the dataset into Train, Test and Validate.

To answer your other question (Is it possible also if I didn't train my model with this variable?)

  • No, because how would the model learn to map 4 input fields into an Output Field which in this case it will be (C).

To understand this problem, compare your approach to the Boston Housing dataset.

import pandas as pd
import numpy as np

# Read dataset into X and Y
df = pd.read_csv('YOURDATASET.csv', delim_whitespace=True, header=None)

dataset = df.values


# for example, your dataset is all loaded into a matrix (aka an array with rows of data, and each Index representing those features mentioned A B C D E)


X = dataset[:, 0:1] + dataset[:, 3:4]
Y = dataset[:, 2]


#print "X: ", X
#print "Y: ", Y


# Define the neural network
from keras.models import Sequential
from keras.layers import Dense

def build_nn():
    model = Sequential()
    model.add(Dense(20, input_dim=5, init='normal', activation='relu'))
    # No activation needed in output layer (because regression)
    model.add(Dense(1, init='normal'))

    # Compile Model
    model.compile(loss='mean_squared_error', optimizer='adam')
    return model


# Evaluate model (kFold cross validation)
from keras.wrappers.scikit_learn import KerasRegressor

# sklearn imports:
from sklearn.cross_validation import cross_val_score, KFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline

# Before feeding the i/p into neural-network, standardise the dataset because all input variables vary in their scales
estimators = []
estimators.append(('standardise', StandardScaler()))
estimators.append(('multiLayerPerceptron', KerasRegressor(build_fn=build_nn, nb_epoch=100, batch_size=5, verbose=0)))

pipeline = Pipeline(estimators)

kfold = KFold(n=len(X), n_folds=10)
results = cross_val_score(pipeline, X, Y, cv=kfold)

print "Mean: ", results.mean()
print "StdDev: ", results.std()
0bserver07
  • 3,390
  • 1
  • 28
  • 56
  • I think it's easy to misunderstand the question. If it was meant like "Is it possible also if I didn't train my model with this variable as Input?" then the answer is Yes and the answer is correct. – Manngo Jan 31 '18 at 18:06
  • You're right, I guess I was trying to justify my answer with more proof, because I wasn't sure in what direction the question was intended at by @Ghemon – 0bserver07 Feb 02 '18 at 18:14
2

I'd say that one way for it to do this is for your network to simply predict C or have C as the label
I have been seeing this again and again. Don't confuse a NN with something more than it actually is. You simply approximate the output Y given an input X by learning a function F. That is your NN.
In your case the output could very easily be C + Other_Output Depending on what that other output is, your network could converge and have good results. It could very well not, so your question is simply at this point incomplete.
You have to ask yourself some questions like:

  1. Does C + Ohter_Output make sense for the give input.
  2. Is there a good way for me serialize the C + Other_Output ? Like having the first K out of N output array elements describing C and the rest N-K describing Other_Output ?
  3. Is C a multiclass problem and if so is Other_Output a different kind of problem or could potentially be turned into a multiclass of the same kind, problem that could converge along with C or make them both multilabel problem ?

These are at least some of the questions you need to ask yourself before even choosing the architecture.
That being said, no, unless you train your network to learn about patterns between A B D and C it will not be able to predict a missing input.

Good luck,
Gabriel

Gabriel Bercea
  • 1,191
  • 1
  • 10
  • 21