Keras binary classification different dataset same prediction results

Question

I have 2 values for predict label, -1 or 1. The learning looks good with LSTM or with Dense, but the prediction is always the same with different predict datasets, changing the layers to Dense does not change the prediction, maybe I am doing something wrong.

Here is the code

// set up data arrays
float[,,] training_data = new float[training.Count(), 12, 200];
float[,,] testing_data = new float[testing.Count(), 12, 200];
float[,,] predict_data = new float[1, 12, 200];

IList<float> training_labels = new List<float>();
IList<float> testing_labels = new List<float>();

// Load Data and add to arrays
...
...

/////////////////////////
NDarray train_y = np.array(training_labels.ToArray());
NDarray train_x = np.array(training_data);

NDarray test_y = np.array(testing_labels.ToArray());
NDarray test_x = np.array(testing_data);

NDarray predict_x = np.array(predict_data);

train_y = Util.ToCategorical(train_y, 2);
test_y = Util.ToCategorical(test_y, 2);

//Build functional model
var model = new Sequential();

model.Add(new Input(shape: new Keras.Shape(12, 200)));
model.Add(new BatchNormalization());

model.Add(new LSTM(128, activation: "tanh", recurrent_activation: "sigmoid", return_sequences: false));            
model.Add(new Dropout(0.2));
model.Add(new Dense(32, activation: "relu"));            
model.Add(new Dense(2, activation: "softmax"));

model.Compile(optimizer: new SGD(), loss: "binary_crossentropy", metrics: new string[] { "accuracy" });
model.Summary();

var history = model.Fit(train_x, train_y, batch_size: 1, epochs: 1, verbose: 1, validation_data: new NDarray[] { test_x, test_y });

var score = model.Evaluate(test_x, test_y, verbose: 2);
Console.WriteLine($"Test loss: {score[0]}");
Console.WriteLine($"Test accuracy: {score[1]}");

NDarray predicted=model.Predict(predict_x, verbose: 2);
                    
Console.WriteLine($"Prediction: {predicted[0][0]*100}");
Console.WriteLine($"Prediction: {predicted[0][1]*100}");

And this is the ouput

    483/483 [==============================] 
    - 9s 6ms/step - loss: 0.1989 - accuracy: 0.9633 - val_loss: 0.0416 - val_accuracy: 1.0000
      4/4 - 0s - loss: 0.0416 - accuracy: 1.0000
    Test loss: 0.04155446216464043
    Test accuracy: 1
    1/1 - 0s

    Prediction: 0.0010418787496746518
    Prediction: 99.99896287918091

The same predict data used in ML.net gives a different results, but with ML.Net the accuracy is only 0.6, that's why I need a deep learning

Innat · Accepted Answer · 2021-07-07T11:53:55.523

I didn't have c# set up to reproduce your code. But I see one small issue that you may need to consider (not sure if this caused the trouble). According to your above code set up, I think you're using the wrong loss function for training. As you set,

Util.ToCategorical(train_y, 2);
model.Add(new Dense(2, activation: "softmax"));

Then your loss function should be 'categorical_crossentropy' and should be not 'binary_crossentropy'. Because, you transformed your labels (-1, 1) to a one-hot encoded vector and set softmax activation in your last layer.

However, as you said, your labels are -1 and 1; so if you treat your problem as a binary classification problem, then the set up should be something like as follows:

# Util.ToCategorical(train_y, 2); # no transformation 
model.Add(new Dense(1, activation: "sigmoid"));
model.compile(..., loss: "binary_crossentropy" )

Reference.

Update

Here I will give some working demo code for better understanding. But before that, here is one small note. Let's say, we have a training data set and labels start from < 0 or minus value, [-2, -1, 0, 1] for example. And to transform this integer value into a one-hot encoded vector, we can pick either tf.keras.utils.to_categorical or pd.get_dummies function. But a small difference between these two methods is, in tf..to_categorical, our integer label must start from 0; which is not in the case of pd.get_dummies, please check my other answers on this. Shortly,

np.random.randint(-1, 1, size=(80))
array([-1, -1,  0,  0,  0 .. ]

pd.get_dummies(a).astype('float32').values[:5] 
array([[1., 0.],
       [1., 0.],
       [0., 1.],
       [0., 1.],
       [0., 1.]], dtype=float32)

tf.keras.utils.to_categorical(a+1, num_classes = 2)[:5]
array([[1., 0.],
       [1., 0.],
       [0., 1.],
       [0., 1.],
       [0., 1.]], dtype=float32)

Okay, I'm giving now some working demo code.

img = tf.random.normal([80, 32], 0, 1, tf.float32)
tar = pd.get_dummies(np.random.randint(-1, 1,  # mine: [-1, 1) - yours: [-1, 1]
                                       size=80)).astype('float32').values 

model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(10, input_dim = 32, 
                       kernel_initializer ='normal', 
                       activation= 'relu'))
model.add(tf.keras.layers.Dense(2, activation='softmax'))

model.compile(loss='categorical_crossentropy', 
              optimizer='adam', metrics=['accuracy'])
model.fit(img, tar, epochs=10, verbose=2)

Epoch 1/10
3/3 - 0s - loss: 0.7610 - accuracy: 0.4375
Epoch 2/10
3/3 - 0s - loss: 0.7425 - accuracy: 0.4375
....
Epoch 8/10
3/3 - 0s - loss: 0.6694 - accuracy: 0.5125
Epoch 9/10
3/3 - 0s - loss: 0.6601 - accuracy: 0.5750
Epoch 10/10
3/3 - 0s - loss: 0.6511 - accuracy: 0.5750

Inference

loss, acc = model.evaluate(img, tar); print(loss, acc)
pred = model.predict(img); print(pred[:5])

3ms/step - loss: 0.6167 - accuracy: 0.7250
0.6166597604751587 0.7250000238418579

# probabilities of the predicted labels -1 and 0
[[0.35116166 0.64883834]
 [0.5542663  0.4457338 ]
 [0.28023133 0.71976864]
 [0.5024315  0.49756846]
 [0.41029742 0.5897026 ]]

Now, if we do

print(pred[0])
pred[0].argmax(-1) # expect: -1, 0 as our label 

[0.35116166 0.64883834]
1

It gives 0.35x and 0.64x for the target label -1 and 0 respectively. But, when we did .argmax for the predicted label from probabilities, it returns zero-indexed highest values; (a reason to make the training labels start from zero indexes, and so I think in your case it's better to transform [-1, 1] to [0, 1]).

Okay, lastly, as you mentioned that, you want predicted label and corresponding confidences scores; and to do that, we can use tf.math.top_k with k = num_of_class.

top_k_values, top_k_indices = tf.math.top_k(pred, k=2)
for values, indices in zip(top_k_values, top_k_indices):
    print(
        'For class {}, model confidence {:.2f}%'
        .format(indices.numpy()[0]-1, values.numpy()[0]*100)
        )
    
    print(
        'For class {}, model confidence {:.2f}%'
        .format(indices.numpy()[1]-1, values.numpy()[1]*100)
        )
    
    '''
    Note: above we substract -1 to match with 
          the target label (-1, 0)

    And it would not necessary if we initially -
    transform our label from (-1, 0) to (0, 1), i.e. start from zero 
    '''
    print()
    break # remove for full results

For class 0, model confidence 64.88%
For class -1, model confidence 35.12%

Verifying the score order

# pick first samples: input and label
model(img)[0].numpy(), tar[0]

(array([0.35116166, 0.64883834], dtype=float32),
 array([0., 1.], dtype=float32))

Here, 
0: for -1
1: for 0

# Again, better to transform (-1, 0) to (0, 1) at initial.

Thanks, I did as you said and now I get different prediction results, but only one value is present at predicted[0][0], predicted[0][1] is null. With one data I got the result 20.3 and with a different data I got 12.2, how can I interpret this data to my -1 and 1 result? Also the result changes even if don't change the data. Thanks! — Mario, Jul 07 '21 at 08:49
Changing to categorical_crossentropy I get the same results again, Prediction: 0.06129330722615123 Prediction: 99.93870258331299 — Mario, Jul 07 '21 at 08:52
For my first comment, I also tried predicted[0] and predicted[1], the first value is the same and the second is null. — Mario, Jul 07 '21 at 08:57
I need the predicted results something like this (-1) 30% probability, (1) 70% probability — Mario, Jul 07 '21 at 09:06
I didn't know that the category values must start from 0, I will try that. I'm still trying to understand the code, because it is a little different from C# also not all Keras functions from Python are implemented in C# Keras, especially from numpy — Mario, Jul 08 '21 at 08:01
I see. Let me know if I need to break through any of my coding parts. I wish I could answer in C#. However, the basics should be the same, that's sure. You can run my code on colab easily and verify each and every aspect. — Innat, Jul 08 '21 at 08:12
I don't understand what the [:5] does? from this line pd.get_dummies(a).astype('float32').values[:5] this is not present in C# — Mario, Jul 08 '21 at 09:23
`[:5]` is called array [slicing in python](https://www.w3schools.com/python/numpy/numpy_array_slicing.asp). It's not an important thing for you. I used it for **checking purposes**. Like, when I used `print(pred[:5])`, it means to print out the first 5 elements or **probabilities score of the first 5 samples**. — Innat, Jul 08 '21 at 10:19
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/234651/discussion-between-m-innat-and-mario-m). — Innat, Jul 08 '21 at 10:20
please go to the chatbox, I wrote some responses; please check. — Innat, Jul 08 '21 at 10:33

Keras binary classification different dataset same prediction results

1 Answers1

Update