2

I'm writing a custom loss function in Keras and just tripped over the following:

Why do Keras loss functions have to return one scalar per batch item rather than just one scalar?

I care about the cumulative loss for the whole batch, not about the loss per item, don't I?

Alex
  • 3,316
  • 4
  • 26
  • 52
  • Because it's cleaner. The batch is just an abstraction used by an optimization algorithm. They built the loss functions so that it's the most intuitive to users. – Nassim Ben Mar 20 '17 at 08:08
  • @NassimBen I would actually claim the exact opposite. An objective function by its definition has a scalar output value, so it would be cleaner if Keras did it that way. And sure, the batch is just an abstraction, but the point of the cost function is to minimize the cost across all training examples (whether "all" refers to the mini batch as in SGD, or the entire training set for deterministic gradient descent) simulataneously. – Alex Mar 20 '17 at 15:59
  • I think I figured it out: `fit()` has an argument `sample_weight` with which you can assign different weights to different samples in the batch. In order for this to work you need the loss function to return the loss per batch item. – Alex Mar 20 '17 at 16:28
  • Indeed... it's more flexible the way it is. I like it like this – Nassim Ben Mar 20 '17 at 17:23

1 Answers1

3

I think I figured it out: fit() has an argument sample_weight with which you can assign different weights to different samples in the batch. In order for this to work you need the loss function to return the loss per batch item.

Alex
  • 3,316
  • 4
  • 26
  • 52