Is there a way to clip intermediate exploded gradients in tensorflow

Question

Problem: a very long RNN net

N1 -- N2 -- ... --- N100

For a Optimizer like AdamOptimizer, the compute_gradient() will give gradients to all training variables.

However, it might explode during some step.

A method like in how-to-effectively-apply-gradient-clipping-in-tensor-flow can clip large final gradient.

But how to clip those intermediate ones?

One way might be manually do the backprop from "N100 --> N99", clip the gradients, then "N99 --> N98" and so on, but that's just too complicated.

So my question is: Is there any easier method to clip the intermediate gradients? (of course, strictly speaking, they are not gradients anymore in the mathematical sense)

Rough idea -- wrap each of your layers into a py_func that uses custom gradient as done [here](https://gist.github.com/harpone/3453185b41d8d985356cbe5e57d67342). The custom gradient function would take vector of backward values and return the clipped version. — Yaroslav Bulatov, Oct 13 '16 at 18:08
clipping weights and/or activations might also help to prevent large gradients — gizzmole, Jun 27 '17 at 16:56

score 2 · Answer 1 · answered Sep 11 '19 at 21:59

2

@tf.custom_gradient
def gradient_clipping(x):
  return x, lambda dy: tf.clip_by_norm(dy, 10.0)

answered Sep 11 '19 at 21:59

Hanhan Li

455
5
13

score 0 · Answer 2 · answered Oct 27 '17 at 18:30

You can use the custom_gradient decorator to make a version of tf.identity which clips intermediate exploded gradients.

``` from tensorflow.contrib.eager.python import tfe

@tfe.custom_gradient def gradient_clipping_identity(tensor, max_norm): result = tf.identity(tensor)

def grad(dresult): return tf.clip_by_norm(dresult, max_norm), None

return result, grad ```

Then use gradient_clipping_identity as you'd normally use identity and your gradients will be clipped in the backward pass.

Is there a way to clip intermediate exploded gradients in tensorflow

2 Answers2