I am trying my own implementation of the DQN paper by Deepmind in tensor flow and am running into difficulty with clipping of the loss function.
Here is an excerpt from the nature paper describing the loss clipping:
We also found it helpful to clip the error term from the update to be between −1 and 1. Because the absolute value loss function |x| has a derivative of −1 for all negative values of x and a derivative of 1 for all positive values of x, clipping the squared error to be between −1 and 1 corresponds to using an absolute value loss function for errors outside of the (−1,1) interval. This form of error clipping further improved the stability of the algorithm.
(link to full paper: http://www.nature.com/nature/journal/v518/n7540/full/nature14236.html)
What I have tried so far is using
clipped_loss_vec = tf.clip_by_value(loss, -1, 1)
to clip the loss I calculate between -1 and +1. The agent is not learning the proper policy in this case. I printed out the gradients of the network and realized that if the loss falls below -1, the gradients all suddenly turn to 0!
My reasoning for this happening is that the clipped loss is a constant function in (-inf,-1) U (1,inf), which means it has zero gradient in those regions. This in turn ensures that the gradients throughout the network are zero (think of it as, whatever input image I provide the network, the loss stays at -1 in the local neighborhood because it has been clipped).
So, my question is two parts:
What exactly did Deepmind mean in the excerpt? Did they mean that the loss below -1 is clipped to -1 and above +1 is clipped to +1. If so, how did they deal with the gradients (i.e. what is all that part about absolute value functions?)
How should I implement loss clipping in tensor flow such that the gradients do not go to zero outside the clipped range (but maybe stay at +1 and -1)? Thanks!