Custom Loss Function in Keras with Sample Weights

Question

I am new to Tensorflow and Keras. I would like to use sample weights in a custom loss function.

If I understand correctly, this post (Custom loss function with weights in Keras) suggests including weights as an input into the network. As well as this: Custom weighted loss function in Keras for weighing each element

I am wondering if I am missing something (I'd also like to not define weights as a global variable). I am also a bit surprised that there is not a way to use it directly, since the Loss class _ _ call _ _ method accepts sample_weight as an argument but if I understand correctly the loss function must have only arguments y_true, and y_pred.

From the documentation (https://keras.io/api/losses/#creating-custom-losses), however:

Creating custom losses Any callable with the signature loss_fn(y_true, y_pred) that returns an array of losses (one of sample in the input batch) can be passed to compile() as a loss. Note that sample weighting is automatically supported for any such loss.

It sounds like one should be able to use sample weighting through the model.fit(..., sample_weight=sample_weight) method.

In this post (Should the custom loss function in Keras return a single loss value for the batch or an arrary of losses for every sample in the training batch? ) there is a lengthy discussion about the size of the output of the loss function.

And, lastly, it is also mentioned that when a custom loss function is created, then, an array of losses (individual sample losses) should be returned. Their reduction is handled by the framework.

It seems to me that if the custom_loss(y_true, y_pred) returns a tensor of size (batch_size, ) then one ought to be able to use sample_weight in fit method. What am I missing?

Thanks a lot for any help!

Code snippets:

class NegLogLikMixedGaussian(Loss):
    """
    Negative Log-Likelihood of Mixed Gaussian with:
        num_components: number of components
        mu: means of the Gaussian components
        sg: standard deviations of the Gaussian components
    """

    def __init__(self, num_params=NUM_PARAMS_MG,
                 num_components=2, name='neg_log_lik_mixed_gaussian'):
        super(NegLogLikMixedGaussian, self).__init__(name=name)
        self.num_params = num_params
        self.num_components = num_components

    def call(self, y_true, p_predict):
        """
        Rem: for MDN the output of the networks are _parameters_ of the
        predicted distribution, _not_ point-estimates

        Parameters
        ----------
        y_true: (batch_size, 1)
            Observed value of the random variable
        p_predict: (batch_size, num_components)
            Output parameters of the network given some input

        Returns
        -------
        Negative log likelihood of the batch (batch_size, 1)

        """
        alpha, mu, sg = tf.split(p_predict,
                                 num_or_size_splits=self.num_params, axis=1)
        gm = tfd.MixtureSameFamily(
            mixture_distribution=tfd.Categorical(probs=alpha),
            components_distribution=tfd.Normal(loc=mu, scale=sg))
        log_likelihood = tf.transpose(gm.log_prob(tf.transpose(y_true)))
        return -tf.reduce_mean(log_likelihood, axis=-1)

My hope was then to be able to use:

model.compile(optimizer=Adam(learning_rate=0.005),
                  loss=NegLogLikMixedGaussian(
                      num_components=2, num_params=3))

And:


# For testing purposes
sample_weight = np.ones(len(y_train)) / len(dh.y_train_scaled)  # this should give same results as un-weighted

# Some non-trivial weights
sample_weights = np.zeros(len(y_train))
sample_weights[:5] = 1
# This will give me same results as above


model.fit(x_train, y_train, sample_weight=sample_weight,
                      batch_size=128, epochs=10)

elbe · Accepted Answer · 2021-11-01T20:11:38.927

Your code is correct, except for a few details, if I understood what you want to do. The sample weights should be of dimension (number of samples,) though the loss should be of dimension (batch_size,). The sample weights can be passed to the fit method and it seems to work. In your custom loss class, num_components and num_params are initialized but only one of the two parameters is used in the call method. I'm not sure I understood the dimensions of the tensor (alpha, mu, sg), is it of dimension (batch_size, 3, num_components) and predicted by the model? Below is a code adapted from yours, in my understanding of your problem.

import tensorflow as tf
import numpy as np
from tensorflow.keras.losses import Loss, BinaryCrossentropy
from tensorflow.keras import Model, Input
from tensorflow.keras.layers import Dense, Concatenate

import tensorflow_probability as tfp
tfd = tfp.distributions

# parameters
num_components = 2
num_samples = 1001
num_features = 10

# synthetic data
x_train = np.random.normal(size=(num_samples, num_features))
y_train = np.random.normal(size=(num_samples, 1, num_components))

print(x_train.shape)
print(y_train.shape)

class NegLogLikMixedGaussian(Loss):
    """
    Negative Log-Likelihood of Mixed Gaussian with:
        num_components: number of components
        mu: means of the Gaussian components
        sg: standard deviations of the Gaussian components
    """

    def __init__(self, num_components=2, name='neg_log_lik_mixed_gaussian'):
        super(NegLogLikMixedGaussian, self).__init__(name=name)
        self.num_components = num_components

    def call(self, y_true, p_predict):
        """
        Rem: for MDN the output of the networks are _parameters_ of the
        predicted distribution, _not_ point-estimates

        Parameters
        ----------
        y_true: (batch_size, 1, num_components)
            Observed value of the random variable
        p_predict: (batch_size, 3, num_components)
            Output parameters of the network given some input

        Returns
        -------
        Negative log likelihood of the batch (batch_size, 1)

        """
        alpha, mu, sg = tf.split(p_predict, num_or_size_splits=3, axis=1)
        gm = tfd.MixtureSameFamily(
            mixture_distribution=tfd.Categorical(probs=alpha),
            components_distribution=tfd.Normal(loc=mu, scale=sg))
        log_likelihood = gm.log_prob(y_true)
        return -tf.reduce_mean(log_likelihood, axis=[1, 2])

# the model (simple predicting (alpha, mu, sigma))
input = Input((num_features,))
alpha = tf.expand_dims(Dense(num_components, 'relu')(input), axis=1)+0.0001
# normalization
alpha = alpha/tf.reduce_sum(alpha, axis=2, keepdims=True)
mu = tf.expand_dims(Dense(num_components)(input), axis=1)
# sg > 0
sg = tf.expand_dims(Dense(num_components, 'relu')(input), axis=1)+ 0.0001

outputs = Concatenate(axis=1)([alpha, mu, sg])

model = Model(inputs=input, outputs=outputs, name='gmm_params')
model.compile(optimizer='adam', loss=NegLogLikMixedGaussian(num_components=num_components), run_eagerly=False)

sample_weight=np.ones((num_samples, ))
sample_weight[500:] = 0.

model.fit(x_train, y_train, batch_size=16, epochs=20, sample_weight=sample_weight)

Hi elbe, thank you very much, terrific! I'm looking over it. I saw that by changing the weights the prediction is different (and I fixing the seed when running the comparison). One follow-up question is what is the size of the output of the loss function. It seems to be it's a scalar and I verified that by calculating the loss on the first two data points separately, will give the same loss as a batch of the first two data points. How does the fit function know how to weigh individual samples? Thanks a lot! — deckard, Nov 02 '21 at 12:02
The output of the loss function is size (batch_size,). By the way, what is the target application of your code? — elbe, Nov 03 '21 at 08:22
A toy example where the inputs is scalar (Input(shape=(1, )) and output is a bi-modal distribution. Thanks again! — deckard, Nov 06 '21 at 16:11

Custom Loss Function in Keras with Sample Weights

1 Answers1