Constructing a Custom Loss Function in Keras

Question

I am attempting to write a custom loss function in Keras from this paper. Namely, the loss I want to create is this:

enter image description here

This is a type of ranking loss for multi-class multi-label problems. Here are the details:

Y_i = set of positive labels for sample i
Y_i^bar = set of negative labels for sample i (complement of Y_i)
c_j^i = prediction on i^th sample at label j

In what follows, both y_true and y_pred are of dimension 18.

def multilabel_loss(y_true, y_pred):
    """ Multi-label loss function.

    More complete description here...

    """    
    zero = K.tf.constant(0, dtype=tf.float32)
    where_one = K.tf.not_equal(y_true, zero)
    where_zero = K.tf.equal(y_true, zero)

    Y_p = K.tf.where(where_one)
    Y_n = K.tf.where(where_zero)

    n = K.tf.shape(y_true)[0]
    loss = 0

    for i in range(n):
        # Here i is the ith sample; for a specific i, I find all locations
        # where Y_p, Y_n belong to the ith sample; axis 0 denotes
        # the sample index space
        Y_p_i = K.tf.equal(Y_p[:,0], K.tf.constant(i, dtype=tf.int64))
        Y_n_i = K.tf.equal(Y_n[:,0], K.tf.constant(i, dtype=tf.int64))

        # Here I plug in those locations to get the values
        Y_p_i = K.tf.where(Y_p_i)
        Y_n_i = K.tf.where(Y_n_i)

        # Here I get the indices of the values above
        Y_p_ind = K.tf.gather(Y_p[:,1], Y_p_i)
        Y_n_ind = K.tf.gather(Y_n[:,1], Y_n_i)

        # Here I compute Y_i and its complement
        yi = K.tf.shape(Y_p_ind)[0]
        yi_not = K.tf.shape(Y_n_ind)[0]

        # The value to normalize the inner summation
        normalizer = K.tf.divide(1, K.tf.multiply(yi, yi_not))

        # This creates a matrix of all combinations of indices k, l from the 
        # above equation; then it is reshaped
        prod = K.tf.map_fn(lambda x: K.tf.map_fn(lambda y: K.tf.stack( [ x, y ] ), Y_n_ind ), Y_p_ind )
        prod = K.tf.reshape(prod, [-1, 2, 1])
        prod = K.tf.squeeze(prod)

        # Next, the indices are fed into the corresponding prediction
        # matrix, where the values are then exponentiated and summed
        y_pred_gather = K.tf.gather(y_pred[i,:].T, prod)
        s = K.tf.cast(K.sum(K.tf.exp(K.tf.subtract(y_pred_gather[:,0], y_pred_gather[:,1]))), tf.float64)
        loss = loss + K.tf.multiply(normalizer, s)
    return loss

My questions are the following:

When I go to compile my graph, I get an error revolving around n. Namely, TypeError: 'Tensor' object cannot be interpreted as an integer. I've looked around, but I can't find a way to stop this. My hunch is that I need to avoid a for loop altogether, which brings me to
How can I write this loss without for loops? I'm fairly new to Keras and have spent a solid few hours writing this custom loss myself. I'd love to write it more concisely. What's blocking me from using all matrices is the fact that Y_i and its complement can take on different sizes for each i.

Please let me know if you'd like me to elaborate more on my code. Happy to do so.

UPDATE 3

As per @Parag S. Chandakkar 's suggestions, I have the following:

def multi_label_loss(y_true, y_pred):

    # set consistent casting
    y_true = tf.cast(y_true, dtype=tf.float64)
    y_pred = tf.cast(y_pred, dtype=tf.float64)

    # this get all positive predictions and negative predictions
    # it also exponentiates them in their respective Y_i classes
    PT = K.tf.multiply(y_true, tf.exp(-y_pred))
    PT_complement = K.tf.multiply((1-y_true), tf.exp(y_pred))

    # this step gets the weight vector that we'll normalize by
    m = K.shape(y_true)[0]
    W = K.tf.multiply(K.sum(y_true, axis=1), K.sum(1-y_true, axis=1))
    W_inv = 1./W
    W_inv = K.reshape(W_inv, (m,1))

    # this step computes the outer product of two tensors
    def outer_product(inputs):
        """
        inputs: list of two tensors (of equal dimensions, 
            for which you need to compute the outer product
        """
        x, y = inputs

        batchSize = K.shape(x)[0]

        outerProduct = x[:,:, np.newaxis] * y[:,np.newaxis,:]
        outerProduct = K.reshape(outerProduct, (batchSize, -1))

        # returns a flattened batch-wise set of tensors
        return outerProduct

    # set up inputs to outer product
    inputs = [PT, PT_complement]

    # compute final loss
    loss = K.sum(K.tf.multiply(W_inv, outer_product(inputs)))

    return loss

I'd like to have a look at it, could you maybe just add some short comments in the for-loop in your code to explain what exactly you are doing or how it relates to the loss function given above? — sdcbr, Aug 13 '18 at 09:46
@sdcbr I've added some comments. I hope they're somewhat helpful! Thank you for taking a look. — boldbrandywine, Aug 13 '18 at 16:19
Just a style note: use `# for comments` and `""" For doc strings (the description of what your function does that comes right after the signature at the top of the function. It can be long and have all the indentation and whatnot that you want. """` I edited yours to illustrate. — Engineero, Aug 13 '18 at 16:30

Autonomous · Accepted Answer · 2018-08-13T21:38:22.480

This is not an answer but more like my thought process which should help you to write a concise code.

Firstly, I don't think you should worry about that error for now because by the time you eliminate for loops, your code may look very different.

Now, I haven't looked at the paper but the predictions c_j^i should be the raw values that come out of the last non-softmax layer (that is what I assume).

So you can add an additional exp layer and compute exp(c_j^i) for each prediction. Now, the for loop comes because of the summation. If you look closely, all it is doing is first forming pairs of all the labels and then subtracting their corresponding predictions. Now, first express the subtraction as exp(c_l^i) * exp(-c_k^i). To see what is happening, take a simple example.

import numpy as np
a = [1, 2, 3]
a = np.reshape(a, (3,1))

Following above explanation, you want the following result.

r1 = sum([1 * 2, 1 * 3, 2 * 3]) = sum([2, 3, 6]) = 11

You could get the same result by matrix multiplication, which is a way to elimiate for loops.

r2 = a * a.T
# r2 = array([[1, 2, 3],
#             [2, 4, 6],
#             [3, 6, 9]])

Extract the upper triangular part, i.e. 2, 3, 6 and sum the array to get 11, which is the result you want. Now, there may be some differences, for example, you may need to exhaustively form all the pairs. You should be able to convert it in the form of matrix multiplication.

Once you have taken care of the summation term, the normalization term can be easily computed if you pre-compute the quantities |Y_i| and \bar{Y_i} for each sample i. Pass them as input arrays and pass them into loss as a part of y_pred. The final summation over i will be done by Keras.

Edit 1: Even if |Y_i| and \bar{Y_i} take on different values, you should be able to build a generic formula for extracting the upper triangular part irrespective of the matrix size once you have pre-computed |Y_i| and \bar{Y_i}.

Edit 2: I don't think you understood me completely. In my opinion, NumPy shouldn't be used at all in the loss function. This is (mostly) doable using only Tensorflow. I will explain once more, while preserving my earlier explanation.

I now know that there is a cartesian product between the positive labels and negative labels (i.e. |Y_i| and \bar{Y_i}, respectively). So first, put a layer of exp after the raw predictions (in TF, not in Numpy).
Now, you need to know which indices out the 18 dimensions of y_true correspond to positive and which ones correspond to negative. If you are using one hot encoding, you can find this out on-the-fly by using tf.where and tf.gather (see here).
By now, you should know the indices j (in c_j^i) that correspond to positive and negative labels. All you need to do is compute \sum_(k, l) {exp(c_k^i) * (1 / exp(c_l^i))} for pairs (k, l). All you need to do is form one tensor consisting of exp(c_k^i) for all k (call it A) and another one consisting of exp(c_l^i) for all l (call it B). Then compute sum(A * B^T). No need to extract the upper triangular part too if you are taking cartesian product. By now, you should have the result of inner-most summation.
Contrary to what I said before, I think you could also compute the normalization factor on-the-fly from y_true.

You only have to figure out how to extend this to three dimensions to handle multiple samples.

Note: The usage of Numpy is probably possible by using tf.py_func but does not seem necessary here. Just use functions of TF.

This has been super helpful! Please see my edits/updates in my question. I've made a numpy implementation of the loss, but I'm still stuck with a Kronecker product. — boldbrandywine, Aug 13 '18 at 19:54
Thank you again. I updated my question and added tensorflow code to generate the loss for a single sample. I have two questions, still. I don't understand the necessity in point (1) putting a layer of exp after the raw predictions; additionally, I am still stumped with generalizing this to 3 dimensions as each sample will have different sizes for Y_i and its complement. You've been very helpful though! — boldbrandywine, Aug 13 '18 at 22:15
You are right. You don't have to put an `exp` layer. Using `tf.exp` is also fine. I just wanted to say that don't use Numpy functions. Regarding the other problem of varying sizes for `Y_i`, observe that the input sizes may be varying but the output is always a scalar. So you can use `tf.map_fn`. All you need to do is write a function that takes as input `y_true` and `y_pred`. Identify `Y_i` and `\bar{Y_i}`, output the innermost summation. Include this function in the `map_fn` as an argument. — Autonomous, Aug 13 '18 at 22:43
I updated my question again. I think I got it this time. Happy to hear your thoughts. — boldbrandywine, Aug 14 '18 at 01:29
I think it is good. However, the last step you need to do to be absolutely sure is to write this function in Numpy (go for correctness and not for efficiency here as Numpy function needs to be correct) and evaluate the results of TF loss and Numpy loss. They should be equal, within a tolerance of 1e-5 or so. Please accept the answer if it solved your issue. — Autonomous, Aug 14 '18 at 17:54
I ran tests and they match. I appreciate you for helping! Your answer is accepted as it lead me to a solution. — boldbrandywine, Aug 15 '18 at 20:31

Constructing a Custom Loss Function in Keras

1 Answers1