I have a gradient entering layer L1 from layer L2_1 and L2_2 at the same time, I need to rescale gradient (L2_1 + L2_2)
before it enters L1 by 1/sqrt(2)
. How can I do this?
My network looks something like this:
L2_1
/ \
input -> L0 - L1 L_final
\ /
L2_2