6

He / MSRA initialization, from Delving Deep into Rectifiers, seems to be a recommended weight initialization when using ReLUs.

Is there a built-in way to use this in TensorFlow? (similar to: How to do Xavier initialization on TensorFlow)?

matwilso
  • 2,924
  • 3
  • 17
  • 24

1 Answers1

11

TensorFlow 2.0

tf.keras.initializers.HeUniform()

or

tf.keras.initializers.HeNormal()

See docs for usage. (h/t to @mable)

TensorFlow 1.0

tf.contrib.layers.variance_scaling_initializer(dtype=tf.float32)

This will give you He / MRSA initialization. The documentation states that the default arguments for tf.contrib.layers.variance_scaling_initializer correspond to He initialization and that changing the arguments can yield Xavier initialization (this is what is done in TF's internal implementation for Xavier initialization).

Example usage:

W1 = tf.get_variable('W1', shape=[784, 256],
       initializer=tf.contrib.layers.variance_scaling_initializer())

or

initializer = tf.contrib.layers.variance_scaling_initializer()
W1 = tf.Variable(initializer([784,256]))
matwilso
  • 2,924
  • 3
  • 17
  • 24
  • 1
    For all who stumble across this: By now, the activation is available in `tf.keras.initializers.HeNormal` (or `tf.keras.initializers.VarianceScaling` using default parameters) – mable Dec 02 '20 at 11:52
  • Could you please explain what is the difference between the `tf.keras.initializers.HeUniform()` and `tf.keras.initializers.HeNormal()`? – Hong Cheng Jul 01 '21 at 01:00
  • HeUniform draws the weights from a Uniform distribution: U(-x, x). HeNormal draws them from a Normal distribution N(0, x), or something like that, where x is a small value determined by He or Xavier of what not. The documentation should explain it. – matwilso Jul 01 '21 at 03:38