0

Say I have a classification problem that has 30 potential binary labels. These labels are not mutually exclusive. The labels tend to be sparse--there is, on average, 1 positive label per all 30 labels but sometimes more than only 1. In the following code, how can I penalize the model from predicting all zeros? The accuracy will be high, but recall will be awful!

import numpy as np
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.models import Model


OUTPUT_NODES = 30
np.random.seed(0)


def get_dataset():
    """
    Get a dataset of X and y. This is a learnable problem as there is some signal in the features. 10% of the time, a
    positive-output's index will also have a positive feature for that index
    :return: X and y data for training
    """
    n_observations = 30000
    y = np.random.rand(n_observations, OUTPUT_NODES)
    y = (y <= (1 / OUTPUT_NODES)).astype(int)  # Makes a sparse output where there is roughly 1 positive label: ((1 / OUTPUT_NODES) * OUTPUT_NODES ≈ 1)

    X = np.zeros((n_observations, OUTPUT_NODES))
    for i in range(len(y)):
        for j, feature in enumerate(y[i]):
            if feature == 1:
                X[i][j] = 1 if np.random.rand(1) > 0.9 else 0  # Makes the input features more noisy
                # X[i][j] = 1  # Using this instead will make the model perform very well

    return X, y


def create_model():
    input_layer = Input(shape=(OUTPUT_NODES, ))
    dense1 = Dense(100, activation='relu')(input_layer)
    dense2 = Dense(100, activation='relu')(dense1)
    output_layer = Dense(30, activation='sigmoid')(dense2)

    model = Model(inputs=input_layer, outputs=output_layer)
    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['Recall'])

    return model


def main():
    X, y = get_dataset()
    model = create_model()
    model.fit(X, y, epochs=10, batch_size=10)

    X_pred = np.random.randint(0, 2, (100, OUTPUT_NODES))
    y_pred = model.predict(X_pred)

    print(X_pred)
    print(y_pred.round(1))


if __name__ == '__main__':
    main()

I believe I read here that I could use:

weighted_cross_entropy_with_logits

to address this issue. How would that affect my final output layer's activation functions? Would I have to have an activation function? How do I specify a penalty to misclassifications of a true positive class?

Jason p
  • 103
  • 1
  • 10

1 Answers1

1

Ok, it is an interesting problem
First you need to define a weighted cross entropy loss wrapper:

def wce_logits(positive_class_weight=1.):
  def mylossw(y_true, logits):
    cross_entropy = tf.reduce_mean(tf.nn.weighted_cross_entropy_with_logits(logits=logits, labels=tf.cast(y_true, dtype=tf.float32), pos_weight=positive_class_weight))
    return cross_entropy
  return mylossw

The positive_class_weight is applied to the positive class data. You need this wrapper for tf.nn.weighted_cross_entropy_with_logits to get a loss function that takes y_true and y_pred (only) as inputs. Note that you must cast y_true to float32.

Second, you can not use the predefined Recall, because it does not work with logits. I found a workaround in this discussion

class Recall(tf.keras.metrics.Recall):
    def __init__(self, from_logits=False, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self._from_logits = from_logits

    def update_state(self, y_true, y_pred, sample_weight=None):
        if self._from_logits:
            super(Recall, self).update_state(y_true, tf.nn.sigmoid(y_pred), sample_weight)
        else:
            super(Recall, self).update_state(y_true, y_pred, sample_weight)

Finally, you need to remove the sigmoid activation from the last layer as you are using logits

def create_model():
    input_layer = Input(shape=(OUTPUT_NODES, ))
    dense1 = Dense(100, activation='relu')(input_layer)
    dense2 = Dense(100, activation='relu')(dense1)
    output_layer = Dense(30)(dense2)

    model = Model(inputs=input_layer, outputs=output_layer)
    model.compile(optimizer='adam', loss=wce_logits(positive_class_weight=27.), metrics=[Recall(from_logits=True)])

    return model

Note that the positive weight is set to 27 here. You can read a discussion on how to correctly calculate the weight

Dharman
  • 30,962
  • 25
  • 85
  • 135
elbe
  • 1,363
  • 1
  • 9
  • 13
  • Great response. These methods you provided work and do produce the desired result. This has made the model have a tendency to predict more 1's instead of all zeros – Jason p Nov 07 '21 at 18:23
  • This problem is also interesting because you can then get the model to predict all 1's by making the positive_class_weight parameter a very large number (like 1000). – Jason p Nov 07 '21 at 18:24