Custom loss function not improving with epochs

Question

I have created a custom loss function to deal with binary class imbalance, but my loss function does not improve per epoch. For metrics, I'm using precision and recall.

Is this a design issue where I'm not picking good hyper-parameters?

weights = [np.array([.10,.90]), np.array([.5,.5]), np.array([.1,.99]), np.array([.25,.75]), np.array([.35,.65])]
for weight in weights:
    print('Model with weights {a}'.format(a=weight))
    model = keras.models.Sequential([
    keras.layers.Flatten(), #input_shape=[X_train.shape[1]]
    keras.layers.Dense(32, activation='relu'),
    keras.layers.Dense(32, activation='relu'),
    keras.layers.Dense(1, activation='sigmoid')])
    model.compile(loss=weighted_loss(weight),metrics=[tf.keras.metrics.Precision(), tf.keras.metrics.Recall()])     
    
    n_epochs = 10
    history = model.fit(X_train.astype('float32'), y_train.values.astype('float32'), epochs=n_epochs, validation_data=(X_test.astype('float32'), y_test.values.astype('float32')), batch_size=64)   
    model.evaluate(X_test.astype('float32'), y_test.astype('float32'))
    pd.DataFrame(history.history).plot(figsize=(8, 5))
    plt.grid(True); plt.gca().set_ylim(0, 1); plt.show()

Custom loss function to deal with class imbalance issue:

def weighted_loss(weights):
    weights = K.variable(weights)            
    def loss(y_true, y_pred):
        y_pred /= K.sum(y_pred, axis=-1, keepdims=True)
        y_pred = K.clip(y_pred, K.epsilon(), 1 - K.epsilon())
        loss = y_true * K.log(y_pred) * weights
        loss = -K.sum(loss, -1)      
        return loss
    return loss

Output:

Model with weights [0.1 0.9]
Epoch 1/10
274/274 [==============================] - 1s 2ms/step - loss: 1.1921e-08 - precision_24: 0.1092 - recall_24: 0.4119 - val_loss: 1.4074e-08 - val_precision_24: 0.1247 - val_recall_24: 0.3953
Epoch 2/10
274/274 [==============================] - 0s 1ms/step - loss: 1.1921e-08 - precision_24: 0.1092 - recall_24: 0.4119 - val_loss: 1.4074e-08 - val_precision_24: 0.1247 - val_recall_24: 0.3953
Epoch 3/10
274/274 [==============================] - 0s 1ms/step - loss: 1.1921e-08 - precision_24: 0.1092 - recall_24: 0.4119 - val_loss: 1.4074e-08 - val_precision_24: 0.1247 - val_recall_24: 0.3953
Epoch 4/10
274/274 [==============================] - 0s 969us/step - loss: 1.1921e-08 - precision_24: 0.1092 - recall_24: 0.4119 - val_loss: 1.4074e-08 - val_precision_24: 0.1247 - val_recall_24: 0.3953
[...]

Image of the input data set and the true y variable class designation: Input Dataset a (17480 X 20) matrix:

y is the output array (2 classes) with dimensions (17480 x 1) and total number of 1's is: 1748 (the class that I want to predict)

I think the reason is that model is too simple, try to add more Dense layers at bottom of the model. — TaQuangTu, Nov 30 '20 at 03:12
Hello @Josh, it would be nice if you could provide a reproducible example. Maybe though a kaggle notebook or google colab. — tornikeo, Dec 01 '20 at 13:51

score 2 · Answer 1 · edited Dec 02 '20 at 01:05

Since there is no MWE present it's rather difficult to be sure. In order to be as educative as possible I'll lay out some observations and remarks.

The first observation is that your custom loss function has really small values i.e. ~10e-8 throughout training. This seems to tell your model that performance is already really good while in fact, when looking at the metrics you chose, it isn't. This indicates that the problem resides near the output or has something to do with the loss function. My recommendation here is since you have a classification problem to have a look at this post regarding weighted cross-entropy [1].

Second observation is that it seems you don't have a benchmark for performance of your model. In general, ML workflow goes from very simple to complex models. I would recommend trying a simple Logistic Regression [2] to get an idea for minimal performance. After this I would try some more complex models such as tree booster (XGBoost/LightGBM/...) or a random forest. Especially considering you are using a full-blown neural network for tabular data with only about 20 numerical features that tends to still be in the traditional machine learning territory.

Once you have obtained a baseline and perhaps improved performance using a standard machine learning technique, you can look towards a neural network again. Some other recommendations depending on the results of the traditional approaches are:

Try several and optimizers and cross-validate them over different learning rates.
Try, as mentioned by @TyQuangTu, some simpler and shallower architectures.
Try an activation function that does not have the "dying neuron" problems such as LeakyRelu or ELU.

Hopefully this answer can help you and if you have any more questions I am glad to help.

[1] Unbalanced data and weighted cross entropy

[2] https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html

Thank you for the note above, will look into weighted cross-entropy. This is not shown here, but I do have baseline models of logistic regression/more complex. I was just curious at the output of keras run. — Josh, Nov 30 '20 at 12:32
Out of curiosity what performance discrepancy do you have between LogReg and the keras model? — David Vander Mijnsbrugge, Nov 30 '20 at 13:34

Custom loss function not improving with epochs

1 Answers1

Linked