I am testing tf.gradienttape. I wrote a model with several output layers, each with an own loss, where i wanted to integrate the gradienttape. My question is: are there specific techniques how to implement the several losses to the gradient as target? I know one option is to take the mean of the losses. Is that always necessary? Can't I just input a list of losses and the gradienttape knows which losses belong to which output layer?
Asked
Active
Viewed 615 times
2
-
Gradient tape is simply a tool to calculate gradient of a tensor with respect to another tensor, and has nothing to do with your model architecture. When you have multiple loss tensors, simply add them together to form the final loss tensor, unless you want to use different optimizers for different losses in the same model. – bui Apr 27 '22 at 07:40
-
So I don't have to take the mean of all the losses and use that as my final loss, it should also work with one final tensor with x losses? – st3ff3n Apr 27 '22 at 08:15
-
If you use a tensor as a target for the gradient tape, it will compute the gradient of the _sum_ of that tensor, yes. – xdurch0 Apr 27 '22 at 08:20
-
Usually a loss tensor has dimension of `(batch_size,)` i.e., the loss itself is *scalar-valued* (a single number). If you are talking about [*vector-valued* loss](https://datascience.stackexchange.com/questions/23257/gradient-descent-with-vector-valued-loss), then you need to define what scalar-valued function of that vector you want to minimize. There's no such thing as "minimizing a vector". – bui Apr 27 '22 at 08:43
-
I have an Reinforcement learning - Actor critic method. and that actor has as output x dense layers, 1 for each action. i calculate a loss for each of these actions, i.e. one loss for each output dense layer. So right now when calling the gradient tape, i give him a tensor with one loss value for each dense layer. Is that working or do i need to calculate the mean loss for each output layer and fit that into tape.gradient as first parameter? – st3ff3n Apr 27 '22 at 10:09
-
You can train model with multiple outputs and losses, and provide loss weights during model compilation. You may refer to [this](https://pyimagesearch.com/2018/06/04/keras-multiple-outputs-and-multiple-losses/) link for reference – Swapnil Masurekar Apr 27 '22 at 15:34
-
Ok, let's assume that you have a tensor, each element of which is a separate loss. Then you use gradient tape to get the gradient of this loss tensor w.r.t. model's weights. Now how would you update the model's weights using that gradient? – bui Apr 28 '22 at 04:26
1 Answers
0
In the TensorFlow document: Unless you set persistent=True a GradientTape can only be used to compute one set of gradients.
To calculate multiple losses, you need multiple tapes. Something like:
with tf.GradientTape() as t1:
loss1_result= loss1(true, pred)
grads1 = t1.gradient(loss1_result, var_list1)
with tf.GradientTape() as t2:
loss2_result= loss2(true, pred)
grads2 = t2.gradient(loss2_result, var_list2)
Then apply it.
opt1.apply_gradients(zip(grads1, var_list1))
opt2.apply_gradients(zip(grads2, var_list2))

man hou
- 1
- 3