Hand crafted Xavier Initializer: Which values for lrelu and relu

Question

As a followup to a reply (not the chosen one) in How to do Xavier initialization on TensorFlow: Anyone having an idea, which values to use in relu and especially leaky relu?

I mean this part:

# use 4 for sigmoid, 1 for tanh activation

This was given there:

(fan_in, fan_out) = ...
    low = -4*np.sqrt(6.0/(fan_in + fan_out)) # use 4 for sigmoid, 1 for tanh activation 
    high = 4*np.sqrt(6.0/(fan_in + fan_out))
    return tf.Variable(tf.random_uniform(shape, minval=low, maxval=high, dtype=tf.float32))

score 3 · Accepted Answer · answered Oct 17 '16 at 23:51

According to He et al 2015 Eq. 15, the theoretical weight variance for one layer when using ReLu becomes:

n*Var[W] = 2

where n is the layer size.

If you want to use pooled variance of both the in layer and the out layer, then it becomes:

(fan_in, fan_out) = ...
low = -2*np.sqrt(1.0/(fan_in + fan_out))
high = 2*np.sqrt(1.0/(fan_in + fan_out))

If you are using tensorflow, they have a variance_scaling_initializer, where you can set the factor variable and the mode variable to control how you want the initialization to be.

If you use the default setting of argument factor=2.0 for this initializer, you'll get the initialization variances suggested by He et al 2015 for ReLu activation. Although you can play around with the argument mode to get slightly different weight initialization variances. Only use in layer:

tf.contrib.layers.variance_scaling_initializer(factor=2.0, mode='FAN_IN')

would give you following:

(fan_in, fan_out) = ...
low = -np.sqrt(2.0/fan_in)
high = np.sqrt(2.0/fan_in)

Use both in and out layers:

tf.contrib.layers.variance_scaling_initializer(factor=2.0, mode='FAN_AVG')

would give you:

(fan_in, fan_out) = ...
low = -np.sqrt(4.0/(fan_in+fan_out)) = -2.0*np.sqrt(1.0/(fan_in+fan_out))
high = np.sqrt(4.0/(fan_in+fan_out)) = 2.0*np.sqrt(1.0/(fan_in+fan_out))

Only use out layer:

tf.contrib.layers.variance_scaling_initializer(factor=2.0, mode='FAN_AVG')

would give you:

(fan_in, fan_out) = ...
low = -np.sqrt(2.0/fan_out)
high = np.sqrt(2.0/fan_out)

Hand crafted Xavier Initializer: Which values for lrelu and relu

1 Answers1