I want to use two assets (BTC, eth) historical data to make a prediction for the next day's price.
Historical data consists of OHLCV data, market cap, and dominance of both assets. So, there is a bunch of numerical data.
The prediction would be a binary (0 or 1) for the next day price where 0 shows that the price will decrease and 1 shows that the price will increase for tomorrow.
Here is a screenshot of the initial data:
The last column values shifted -1 upward. Hence, today's data will be used to see if the next day is a green or red day.
I used MinMaxScaler to scale the data, as below:
min_max_scaler = MinMaxScaler()
clean_df_scaled = min_max_scaler.fit_transform(all_data)
dataset = pd.DataFrame(clean_df_scaled)
#train test validation split
x_train, x_test, y_train, y_test = train_test_split(dataset.iloc[:, :15], dataset.iloc[:, 15], test_size=0.2)
x_train, x_val, y_train, y_val = train_test_split(x_train, y_train, test_size=0.2)
y_train = np.array(y_train, dtype=np.int)
y_test = np.array(y_test, dtype=np.int)
y_val = np.array(y_val, dtype=np.int)
x_train = np.reshape(np.asarray(x_train), (x_train.shape[0], x_train.shape[1], 1))
x_test = np.reshape(np.asarray(x_test), (x_test.shape[0], x_test.shape[1], 1))
x_val = np.reshape(np.asarray(x_val), (x_val.shape[0], x_val.shape[1], 1))
Here is the model:
model = Sequential()
model.add(LSTM(64, input_shape=(x_train.shape[1], x_train.shape[2]), return_sequences=True))
model.add(LSTM(32))
model.add(Dense(8, input_dim=16, activation='relu'))
model.add(Dense(4, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
history = model.fit(x_train, y_train, epochs=100)
test_loss, test_acc = model.evaluate(x_val, y_val)
print('Test accuracy:', test_acc)
And the output shows:
...
Epoch 98/100
24/24 [==============================] - 0s 12ms/step - loss: 0.6932 - accuracy: 0.4968
Epoch 99/100
24/24 [==============================] - 0s 12ms/step - loss: 0.6932 - accuracy: 0.4998
Epoch 100/100
24/24 [==============================] - 0s 13ms/step - loss: 0.6929 - accuracy: 0.5229
6/6 [==============================] - 1s 4ms/step - loss: 0.6931 - accuracy: 0.5027
Test accuracy: 0.5027027130126953
I don't understand what's the problem here! I used softmax activation also, but no luck (I know, I should use Sigmoid for this).
I even tried to remove the LSTM layers and only use the Dense. But still no luck.
P.S:
When I use the model to make a prediction:
predictions = model.predict(x_test)
It doesn't return binary values, it returns floats like this:
...
[0.5089301 ],
[0.5093736 ],
[0.5081916 ],
[0.50889516],
[0.5077091 ],
[0.5088633 ]], dtype=float32)
Is it normal? should I convert them to binary (0 or 1) based on mean values?