2

Below is the code to predict if it close up or down the next day (Up =1, down =0)

What I did was to create a dataframe and predict just using PriceChange (today close - yesterday close) to predict Next Day price change up or down (Next day Close - Today Close)

df['PriceChange'] = (df['Close'] > df['Close'].shift(1)).astype(int)
df['Closeupnextday'] = (df['Close'].shift(-1) > df['Close']).astype(int)

So the dataframe looks like this:

            PriceChange  Closeupnextday
    0             0               1
    1             1               1
    2             1               1
    3             1               1
    4             1               0
    5             0               0
    6             0               0
    7             0               1

It constantly gives me an accuracy of 1.000 To be fair it should be 50+% accuracy only. I believe something is wrong in the code below but I can't find it.

I should add that after epoch 20/500 it constantly gives me 1.000 accuracy

Any advice, please?

def load_data(stock, seq_len):
    amount_of_features = len(stock.columns)
    data = stock.as_matrix() #pd.DataFrame(stock)
    sequence_length = seq_len + 1
    result = []
    for index in range(len(data) - sequence_length):
        result.append(data[index: index + sequence_length])

    result = np.array(result)
    row = round(0.9 * result.shape[0])
    train = result[:int(row), :]
    x_train = train[:, :-1]
    y_train = train[:, -1][:,-1]
    x_test = result[int(row):, :-1]
    y_test = result[int(row):, -1][:,-1]

    x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], amount_of_features))
    x_test = np.reshape(x_test, (x_test.shape[0], x_test.shape[1], amount_of_features))  

    return [x_train, y_train, x_test, y_test]

def build_model(layers):
    model = Sequential()

    model.add(LSTM(
        input_dim=layers[0],
        output_dim=layers[1],
        return_sequences=True))
    model.add(Dropout(0.0))

    model.add(LSTM(
        layers[2],
        return_sequences=False))
    model.add(Dropout(0.0))

    model.add(Dense(
        output_dim=layers[2]))
    model.add(Activation("linear"))

    start = time.time()
    model.compile(loss="mse", optimizer="rmsprop",metrics=['accuracy'])
    print("Compilation Time : ", time.time() - start)
    return model

def build_model2(layers):
        d = 0.2
        model = Sequential()
        model.add(LSTM(128, input_shape=(layers[1], layers[0]), return_sequences=True))
        model.add(Dropout(d))
        model.add(LSTM(64, input_shape=(layers[1], layers[0]), return_sequences=False))
        model.add(Dropout(d))
        model.add(Dense(16, activation="relu", kernel_initializer="uniform"))        
        model.add(Dense(1, activation="relu", kernel_initializer="uniform"))
        model.compile(loss='mse',optimizer='adam',metrics=['accuracy'])
        return model


window = 5
X_train, y_train, X_test, y_test = load_data(df[::-1], window)
print("X_train", X_train.shape)
print("y_train", y_train.shape)
print("X_test", X_test.shape)
print("y_test", y_test.shape) 

# model = build_model([3,lag,1])
model = build_model2([len(df.columns),window,1]) #11 = Dataframe axis 1

model.fit(
    X_train,
    y_train,
    batch_size=512,
    epochs=500,
    validation_split=0.1,
    verbose=1)


trainScore = model.evaluate(X_train, y_train, verbose=0)
print('Train Score: %.2f MSE (%.2f RMSE)' % (trainScore[0], math.sqrt(trainScore[0])))

testScore = model.evaluate(X_test, y_test, verbose=0)
print('Test Score: %.2f MSE (%.2f RMSE)' % (testScore[0], math.sqrt(testScore[0])))


# print(X_test[-1])
diff=[]
ratio=[]
p = model.predict(X_test)
for u in range(len(y_test)):
    pr = p[u][0]
    ratio.append((y_test[u]/pr)-1)
    diff.append(abs(y_test[u]- pr))
    #print(u, y_test[u], pr, (y_test[u]/pr)-1, abs(y_test[u]- pr))


print(p)
print(y_test)
Mario
  • 1,631
  • 2
  • 21
  • 51
J Ng
  • 779
  • 7
  • 18
  • Check weather you have accidentally included the target values as training data. Unless you have done that kind of a mistake, this is impossible I guess. – Dimuth Tharaka Menikgama Nov 05 '17 at 13:24
  • My dataframe has no issues... I think its with the code but I cant figure it out – J Ng Nov 05 '17 at 13:34
  • 1
    Out of curiosity, why'd you opt for minimising the MSE and have your final layer be a ReLU for a classification? – jonnybazookatone Nov 05 '17 at 14:37
  • This is adapted from another code thus its the defaults which I did not change, any suggestion what will be better? – J Ng Nov 05 '17 at 14:58
  • Have you outputted a test set of data to maybe a csv to do some visual inspection? This always helps be determine if I have a good model or something is wrong. Also, as others elude too, `mse` is for regression problems, this is binary classification, your y's should be binary and your loss should be `binary_crossentropy` – DJK Nov 05 '17 at 17:26
  • How big is your data set? You said it gives you accuracy of 1 after epoch 20/500. If your data set is too small then your model may be overfitting. – myrtlecat Nov 05 '17 at 19:16

1 Answers1

6

(Since you don't clarify it, I assume here that you are talking about the test accuracy - the train accuracy can indeed be 1.0, depending on the details of your data & model.)

Well, such issues are usual when one messes up problems, losses, and metrics - see this answer of mine for a similar confusion when binary_crossentropy is used as loss in Keras for a multi-class classification problem.

Before trying any remedy, try predicting a couple of examples manually (i.e. with model.predict instead of model.evaluate); cannot do it myself since I don't have your data, but I bet the results you'll get will not conform to the perfect accuracy implied by your model.evaluate results.

To the heart of your issue: since you have a binary classification problem, you should definitely ask for loss='binary_crossentropy' in your model compilation, and not mse.

Cannot be sure on what exactly is the value of 1.0 you get from model.evaluate, but as I show in the answer linked above, what evaluation metric Keras returns for a model compiled with metrics=['accuracy'] is highly dependent on the respective entry for loss; and even if I was eventually able to figure out what was the issue in that question, I cannot even start imagining what exactly goes on here, where you request the accuracy (i.e. a classification metric) for a regression loss (mse)...

desertnaut
  • 57,590
  • 26
  • 140
  • 166