0

I am doing some reading on custom loss functions in tensorflow and was going through the example provided in the tutorials page (see the link below).

https://www.tensorflow.org/tutorials/customization/custom_training_walkthrough

Here is a simple loss provided in the link.

loss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

def loss(model, x, y, training):
  y_ = model(x, training=training)
  return loss_object(y_true=y, y_pred=y_)

Following this example, the author mentions that Use the tf.GradientTape context to calculate the gradients used to optimize your model.

My question is why one would need to use tf.GradientTape? Doesn't tensorflow actually computes the gradients when using any optimizer such as Adam?

In fact, I also looked at a previous question posted here.

How to write a custom loss function in Tensorflow?

You can see that none of the answers uses tf.GradientTape. I am sharing one of the answers posted, which makes lots of sense to me.

def focal_loss(y_true, y_pred):
  pt_1 = tf.where(tf.equal(y_true, 1), y_pred, tf.ones_like(y_pred))
  pt_0 = tf.where(tf.equal(y_true, 0), y_pred, tf.zeros_like(y_pred))
  custom_loss=kb.square((pt_1-pt_0)/10)
  return custom_loss    

model.compile(loss=focal_loss,
          optimizer='adam',
          metrics=['accuracy'])

On the contrary, in another similar question, all the answers use tf.GradientTape.

Tensorflow 2.0 Custom loss function with multiple inputs

At the moment, I am quite a bit confused. Could someone explain what's the use of tf.GradientTape and when should I consider using it?

whitepanda
  • 471
  • 2
  • 12

1 Answers1

2

It all depends on how you are training your model. If you are using model.fit to train your model then you do not have to explicitly use tf.GradientTape, but it is still being used under the hood! If you define a custom training loop like the one in the walkthrough you referenced, you will have to use tf.GradientTape, which

enables you to retrieve the gradients of the trainable weights of the layer with respect to a loss value. Source

Now regarding your question: The calculated gradients are formally partial derivatives or the measure of change, and your model's optimizer adjusts the individual weights of your model based on these gradients.

AloneTogether
  • 25,814
  • 5
  • 20
  • 39