2

I have a gradient entering layer L1 from layer L2_1 and L2_2 at the same time, I need to rescale gradient (L2_1 + L2_2) before it enters L1 by 1/sqrt(2). How can I do this?

My network looks something like this:

                L2_1
               /    \
input -> L0 - L1     L_final
               \    /
                L2_2
userqwerty1
  • 887
  • 2
  • 9
  • 23

1 Answers1

1

You can divide L2_1 and L2_2 output by sqrt(2). That will rescale both activations and backprop. If you want to modify only backprop but not activations, you can use gradient replacement trick from here

Community
  • 1
  • 1
Yaroslav Bulatov
  • 57,332
  • 22
  • 139
  • 197
  • `L2_1_t = 1/sqrt(2)*L2_1 L2_1_y = L2_1_t + tf.stop_gradient(L2_1 - L2_1_t)` and `L2_2_t = 1/sqrt(2)*L2_2 L2_2_y = L2_2_t + tf.stop_gradient(L2_2 - L2_2_t)` and in the model construction code I would use `L2_1_y` and `L2_2_y` in place of `L2_1`, `L2_2` (as input to the next layer), is this right? – userqwerty1 May 09 '16 at 16:47
  • Looks right at first glance, but feel free to update this Q if you try it and it works, since others may hit have the same request – Yaroslav Bulatov May 09 '16 at 16:55