0

I am working on a weighted version of SparseCategoricalCrossentropy. right now my implementation is converting y_true to one hot form and calculates the cross entropy then multiplies it with a weight matrix. I get the same output between my implementation and SparseCategoricalCrossentropy when weights are all 1 however my problem is with one hot encoding. I have a lot of classes (32+bg) and when using one hot encoding I run out of memory for large images/batch sizes which does not happen with SparseCategoricalCrossentropy. I am trying to figure out how is the built in one implemented (is there a way to avoid one hot encoding etc.). How is the built in one implemented or where is it implemented looking at [1] it is probably implemented on the native side but I can not find it?

[1] https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/losses.py#L692

Hamza Yerlikaya
  • 49,047
  • 44
  • 147
  • 241
  • check this myb helpful https://stackoverflow.com/questions/59787897/how-does-tensorflow-sparsecategoricalcrossentropy-work – Yefet Mar 06 '21 at 14:03

1 Answers1

3

The SparseCategoricalCrossentropy documentation has a "View Source on GitHub" tab you can click on. This will show you the implementation. Doing this leads us to line 666 of tensorflow.python.keras.losses. We can see from the class definition that it wraps a function sparse_categorical_crossentropy which is defined on line 4867 of tensorflow.keras.backend. We can see at the bottom of the function definition this is a wrapper around tf.nn.sparse_softmax_cross_entropy_with_logits and this function definition can be found in tensorflow.python.ops.nn_ops. At the bottom of this function definition, we can see it is a wrapper around gen_nn_ops.sparse_softmax_cross_entropy_with_logits. If you look for gen_nn_ops, you won't find it. It is the name of the *.so file that python imports to run tensorflow's C++ op code. So what we are really looking for is a sparse softmax C++ kernel, which can be found in tensorflow.core.kernels.sparse_xent_op.cc. This op calls a functor which calls a method SparseXentEigenImpl whose implementation can be found in the corresponding header file, sparse_xent_op.h. And starting on line 47 of that file you can see how they create the sparse loss.

// Generator for calculation of the sparse Xent loss.
// This generator takes the logits, the sum of the exponentiated
// logits, and the label indices.  For each minibatch entry, ignoring
// the batch index b, it calculates:
//
//   loss[j] = (log(sum_exp_logits) - logits[j]) * 1{ j == label }
//
// for j = 0 .. num_classes.  This value must be summed over all j for
// the final loss.

And on line 224 there is a comment of outlining the loss calculation formula.

//  sum(-labels *
//     ((logits - max_logits) - log(sum(exp(logits - max_logits)))))
//  along classes

Not sure if this helps you create your weighted op, but this is how sparse xent is calculated in tensorflow.

Edit: There also is a method tf.nn.weighted_cross_entropy_with_logits. Not sure if that will work with your sparsity requirement, but will probably work better than trying to implement something yourself.

o-90
  • 17,045
  • 10
  • 39
  • 63