7

I have a generator function that infinitely cycles over some directories of images and outputs 3-tuples of batches the form

[img1, img2], label, weight

where img1 and img2 are batch_size x M x N x 3 tensors, and label and weight are each batch_size x 1 tensors.

I provide this generator to the fit_generator function when training a model with Keras.

For this model I have a custom cosine contrastive loss function,

def cosine_constrastive_loss(y_true, y_pred):
    cosine_distance = 1 - y_pred
    margin = 0.9
    cdist = y_true * y_pred + (1 - y_true) * keras.backend.maximum(margin - y_pred, 0.0)
    return keras.backend.mean(cdist)

Structurally everything runs OK with my model. There are no errors and it is consuming the inputs and labels from the generator as expected.

But now I am seeking to directly use the weights parameter per each batch and perform some customized logic inside of cosine_contrastive_loss based on the sample-specific weight.

How can I access this parameter from the structure of a batch of samples at the moment of the loss function being executed?

Note that since it is an infinitely cycling generator, it is not possible to precompute weights or compute them on the fly to either curry the weights into the loss function or generate them.

They have to be generated in unison with the samples being generated, and indeed there is custom logic in my data generator that determines the weights dynamically from properties of img1, img2 and label at the moment they are generated for a batch.

ely
  • 74,674
  • 34
  • 147
  • 228

2 Answers2

5

Manual training loop alternative

The only thing I can think of is a manual training loop where you get the weights yourself.

Have a weights tensor and a non variable batch size:

weights = K.variable(np.zeros((batch_size,)))

Use them in your custom loss:

def custom_loss(true, pred):
    return someCalculation(true, pred, weights)

For a "generator":

for e in range(epochs):
    for s in range(steps_per_epoch):
        x, y, w = next(generator) #or generator.next(), not sure
        K.set_value(weights, w)

        model.train_on_batch(x, y)

For a keras.utils.Sequence:

for e in range(epochs):
    for s in range(len(generator)):
        x,y,w = generator[s]

        K.set_value(weights, w)
        model.train_on_batch(x,y)

I know this answer is not optimal because it does not parallelize getting data from the generator as it happens with fit_generator. But it's the best easy solution I can think of. Keras didn't expose the weights, they are applied automatically in some hidden source code.


Let the model calculate the weights alternative

If calculating the weights can be done from x and y, you can delegate this task to the loss function itself.

This is sort of hacky, but may work:

input1 = Input(shape1)
input2 = Input(shape2)

# .... model creation .... #

model = Model([input1, input2], outputs)

Let the loss have access to input1 and input2:

def custom_loss(y_true, y_pred):
    w = calculate_weights(input1, input2, y_pred)
    # .... rest of the loss .... #

The issue here is whether you can or not calculate the weigths as a tensor from the inputs.

Community
  • 1
  • 1
Daniel Möller
  • 84,878
  • 18
  • 192
  • 214
  • thanks! it's definitely not ideal to have to manually call train_on_batch like this, but it offers a solid work around, very appreciated! – ely Sep 19 '19 at 13:19
  • Added an alternative answer at the end, that might be interesting to test. – Daniel Möller Sep 19 '19 at 13:34
  • The added answer is a nice idea, but unfortunately in my case the weights depend on metadata about the inputs that is accessed at the time the inputs are generated and loaded in the guts of the batch generator. Basically, the batch generator iterates over manifests of files and metadata about those files to determine how to sample positive and negative pairs of images. Depending on other metadata, some positive and negative images get larger weights for training because the model has to perform well on them at prediction time after training. – ely Sep 19 '19 at 15:14
  • I see.... if you do need a better performance, you can try to pass this metadata to the model as an extra input. Then multiply it by zero and add with the main input (just to have a connection to the output). This may sound extra complicated, but depending on the time it gets to load a batch, it may significantly increase your training speed. – Daniel Möller Sep 19 '19 at 15:20
  • The metadata is a lot of string data that is file metadata (timestamp of creation, who created it, properties about the circumstances when it was created). Most of it is pre-computed down to a simple set of criteria that can be checked by the guts of the data generator, which then has to actually load a batch worth of image pairs, labels, etc. The training speed has been quite fast for this and is not very limited by the batch loading time. I think to find a structured way for all that string data to be accepted into tensors in the model would really make the code complexity very problematic. – ely Sep 19 '19 at 15:22
  • Is there anyway you could embed the sample weights (or any other generator value) in the `y_pred` to then be extracted in the loss function. The only complication I can think of with that is that it would throw off the dimensions in your network. – seeiespi May 26 '20 at 04:02
3

The loss function in Keras Tensorflow v2 is called with the sample weighs

output_loss = loss_fn(y_true, y_pred, sample_weight=sample_weight)

https://github.com/keras-team/keras/blob/tf-2/keras/engine/training.py


You can use GradientTape for custom training, see https://www.tensorflow.org/guide/keras/train_and_evaluate#part_ii_writing_your_own_training_evaluation_loops_from_scratch