Keras accuracy stuck at 50% in a binary classification problem

Question

I want to use two assets (BTC, eth) historical data to make a prediction for the next day's price.
Historical data consists of OHLCV data, market cap, and dominance of both assets. So, there is a bunch of numerical data.
The prediction would be a binary (0 or 1) for the next day price where 0 shows that the price will decrease and 1 shows that the price will increase for tomorrow.

Here is a screenshot of the initial data:

The last column values shifted -1 upward. Hence, today's data will be used to see if the next day is a green or red day.
I used MinMaxScaler to scale the data, as below:

min_max_scaler = MinMaxScaler()
clean_df_scaled = min_max_scaler.fit_transform(all_data)
dataset = pd.DataFrame(clean_df_scaled)

#train test validation split
x_train, x_test, y_train, y_test = train_test_split(dataset.iloc[:, :15], dataset.iloc[:, 15], test_size=0.2)

x_train, x_val, y_train, y_val = train_test_split(x_train, y_train, test_size=0.2)

y_train = np.array(y_train, dtype=np.int)
y_test = np.array(y_test, dtype=np.int)
y_val = np.array(y_val, dtype=np.int)

x_train = np.reshape(np.asarray(x_train), (x_train.shape[0], x_train.shape[1], 1))
x_test = np.reshape(np.asarray(x_test), (x_test.shape[0], x_test.shape[1], 1))
x_val = np.reshape(np.asarray(x_val), (x_val.shape[0], x_val.shape[1], 1))

Here is the model:

model = Sequential()
model.add(LSTM(64, input_shape=(x_train.shape[1], x_train.shape[2]), return_sequences=True))
model.add(LSTM(32))
model.add(Dense(8, input_dim=16, activation='relu'))
model.add(Dense(4, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

history = model.fit(x_train, y_train, epochs=100)

test_loss, test_acc = model.evaluate(x_val, y_val)
print('Test accuracy:', test_acc)

And the output shows:

...
Epoch 98/100
24/24 [==============================] - 0s 12ms/step - loss: 0.6932 - accuracy: 0.4968
Epoch 99/100
24/24 [==============================] - 0s 12ms/step - loss: 0.6932 - accuracy: 0.4998
Epoch 100/100
24/24 [==============================] - 0s 13ms/step - loss: 0.6929 - accuracy: 0.5229
6/6 [==============================] - 1s 4ms/step - loss: 0.6931 - accuracy: 0.5027
Test accuracy: 0.5027027130126953

I don't understand what's the problem here! I used softmax activation also, but no luck (I know, I should use Sigmoid for this).
I even tried to remove the LSTM layers and only use the Dense. But still no luck.

P.S:

When I use the model to make a prediction:

predictions = model.predict(x_test)

It doesn't return binary values, it returns floats like this:

...
[0.5089301 ],
       [0.5093736 ],
       [0.5081916 ],
       [0.50889516],
       [0.5077091 ],
       [0.5088633 ]], dtype=float32)

Is it normal? should I convert them to binary (0 or 1) based on mean values?

Regarding your question at the end, it is normal for a binary classifier to return floats instead of binary values since the classifier is supposed to predict a probability between 0 and 1. The 'accuracy' metric in Keras is calculated by treating any values above 0.5 as a 1, and any values below 0.5 as a 0. — Horace Lee, May 13 '21 at 16:30
@HoraceLee thank you, I've searched and found that it's just like as you said. Do you have any idea about the first part? — M. Safari, May 13 '21 at 16:36
I think the problem might be that the `train_test_split()` function shuffles your data so the 'continuity' of your sequence is not preserved, if that makes sense. Maybe try adding `shuffle=False` to your calls to `train_test_split()`. (I don't have much experience with training sequence models so I might be wrong) — Horace Lee, May 13 '21 at 17:02
I believe you don't need `input_dim` in the dense layer, also you tried more complex network with only Dense layers? Looking at the loss it seems like network is not learning at all. (0.6931 is the max value for vanilla BCE AFAIK). — Frightera, May 16 '21 at 13:39
@Frightera I've just removed `input_dim` from the first dense layer, it still shows 0.51 for accuracy. I didn't get what you mean by trying a more complex network with only dense layers? — M. Safari, May 16 '21 at 13:45
@Frightera it's still 0.51 even trying with 128 units dense layers... strange — M. Safari, May 16 '21 at 13:48
Probably data needs more preprocessing, model configuration is correct. (sigmoid with BCE). — Frightera, May 16 '21 at 13:54
Do mind that if you had a model that has that can predict with over 50% accuracy if the price goes or down, with expected outcome above zero, you'd have a money-making machine. Predicting future outcome based on past performance is not going to give outstanding results, regardless of the model. — Lukasz Tracewski, May 16 '21 at 14:59
@LukaszTracewski just look at the similar projects, for sure we can predict future based on current data, this is exactly what technical analysis does on the chart data. but again, yes you are correct, we can't predict future with high accuracy. even 70% would be a great result where the system predict 7 out of 10 correctly. anyway, this is not something we want to prove here. there's a confusion in the model where it gets only 50% accuracy. — M. Safari, May 16 '21 at 15:12

score 1 · Answer 1 · edited May 20 '21 at 08:47

Safari

I believe as Horace Lee already point out in a comment that the problem is on the train_test_split. But also a problem with the data arrangement. On the example data and the way that the train_test_split is used each row represents a sample and each column contains a data feature. But the time series that you’re trying to model is column-wise encoded. When the data is fed into the model, the time-dependent relationship is not existing as the samples contain information of the same data point. Thus the LSTM layer is unable to find any relationship because the sequence dependency is not encoded row-wise.

You can split the data in the same proportion as you did it but taking it without shuffle.

x_train, x_test, y_train, y_test =dataset.iloc[0:int(len(dataset)*0.8), :15],dataset.iloc[0:int(len(dataset)*0.8), 15], dataset.iloc[int(len(dataset)*0.8):-1, :15],dataset.iloc[int(len(dataset)*0.8):-1, 15]

And change shuffle=False at model.fit to prevent any shuffling of the data. That will retain the sequence dependency in the data.

Also, as each column in the data set is a time series you can use a window method to model each time series independently. Just take a window size fragment ad slide through the data one time step at a time.

window_dataset=[dataset.iloc[k:k+window, “any feature column”] for k in range(int(len(dataset)*0.8))]

target=[dataset.iloc[k+window, 15] for k in range(int(len(dataset)*0.8))]

But before trying the LSTM architecture try using a sequential model of only dense layers or a single layer LSTM and check for imbalance in the data with data['target_header'].value_counts() Taking a continuous fragment of data can take more samples of one particular class.

score 0 · Answer 2 · answered May 16 '21 at 18:40

The following answer is based on your wonder of getting such return scores of .predict.

When you pass model.predict(x_test), it will give you matrices in which each row represents the probability of those inputs to be in class 1. By this, you're getting the probability of each instance of x_test to be in class 1.

...
[0.5089301 ],
       [0.5093736 ],
       [0.5081916 ],
       [0.50889516],
       [0.5077091 ],
       [0.5088633 ]], dtype=float32)

In order to get binary output, normally we set a threshold value (say 0.5), greater than this are considered as class 1 and below of it are considered as class 0. So, you can do as follows to get binary output (1 and 0)

(model.predict(x_test) > 0.5).astype("int32")

Here, 0.5 is the threshold that we pick. Check this answer for more details of it.

score 0 · Answer 3 · answered Feb 25 '22 at 21:59

0

What worked for me was to increase the learning rate (not so much that the gradients explode) to the point that all the outputs stop being all 0.5, and start to become more random, when that point is achieved, then I slowly started to reduced the learning rate until getting an acceptable solution. The gradient try to navigate across a surface with many local solutions, and there are some saddle points that are far from be the optimal solution but since it is a local minimum the gradient descent is not able to get out of there by small updates. Probably the initialization set the model into local minima and the only way to jump to better local minima is using a high learning rate. But this method requires a lot of fine tuning, try and error.

answered Feb 25 '22 at 21:59

Juan Pablo Villa Serna

1

Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Feb 25 '22 at 23:06
This does not provide an answer to the question. Once you have sufficient [reputation](https://stackoverflow.com/help/whats-reputation) you will be able to [comment on any post](https://stackoverflow.com/help/privileges/comment); instead, [provide answers that don't require clarification from the asker](https://meta.stackexchange.com/questions/214173/why-do-i-need-50-reputation-to-comment-what-can-i-do-instead). - [From Review](/review/late-answers/31163069) – Elvin Aghammadzada Mar 02 '22 at 05:37

Keras accuracy stuck at 50% in a binary classification problem

P.S:

3 Answers3