How to do Xavier initialization on TensorFlow

Question

I'm porting my Caffe network over to TensorFlow but it doesn't seem to have xavier initialization. I'm using truncated_normal but this seems to be making it a lot harder to train.

Xavier is the default initialization. See https://stackoverflow.com/questions/37350131/what-is-the-default-variable-initializer-in-tensorflow — Thomas Ahle, Mar 15 '18 at 23:01

score 120 · Answer 1 · edited Dec 01 '16 at 16:32

120

Since version 0.8 there is a Xavier initializer, see here for the docs.

You can use something like this:

W = tf.get_variable("W", shape=[784, 256],
           initializer=tf.contrib.layers.xavier_initializer())

edited Dec 01 '16 at 16:32

fabian789

8,348
4
45
91

answered Apr 22 '16 at 04:23

Sung Kim

8,417
9
34
42

3

do you know to do this without giving the shape to `get_variable` but instead giving it to the initializer? I used to have `tf.truncated_normal(shape=[dims[l-1],dims[l]], mean=mu[l], stddev=std[l], dtype=tf.float64)` and I specified the shape there but that now your suggestion sort of screws my code up. Do you have any suggestions? – Charlie Parker Jul 25 '16 at 20:12
1

@Pinocchio you can simply write yourself a wrapper which has the same signature as `tf.Variable(...)` and uses `tf.get_variable(...)` – jns Aug 23 '16 at 10:17
2

"Current" link without version: https://www.tensorflow.org/api_docs/python/tf/contrib/layers/xavier_initializer – scipilot Aug 27 '17 at 13:33

score 28 · Answer 2 · edited Jul 07 '18 at 17:41

28

Just to add another example on how to define a tf.Variable initialized using Xavier and Yoshua's method:

graph = tf.Graph()
with graph.as_default():
    ...
    initializer = tf.contrib.layers.xavier_initializer()
    w1 = tf.Variable(initializer(w1_shape))
    b1 = tf.Variable(initializer(b1_shape))
    ...

This prevented me from having nan values on my loss function due to numerical instabilities when using multiple layers with RELUs.

edited Jul 07 '18 at 17:41

rayryeng

102,964
22
184
193

answered Jul 28 '17 at 19:24

Saullo G. P. Castro

56,802
26
179
234

2

This format fitted my code best - and it's allowed me to return my learning rate to 0.5 (I had to lower it to 0.06 when adding another relu'd layer). Once I'd applied this initialiser to ALL hidden layers I'm getting incredibly high validation rates right from the first few hundred epochs. I can't believe the difference it's made! – scipilot Aug 27 '17 at 14:12

score 16 · Accepted Answer · answered Mar 25 '19 at 09:42

In Tensorflow 2.0 and further both tf.contrib.* and tf.get_variable() are deprecated. In order to do Xavier initialization you now have to switch to:

init = tf.initializers.GlorotUniform()
var = tf.Variable(init(shape=shape))
# or a oneliner with a little confusing brackets
var = tf.Variable(tf.initializers.GlorotUniform()(shape=shape))

Glorot uniform and Xavier uniform are two different names of the same initialization type. If you want to know more about how to use initializations in TF2.0 with or without Keras refer to documentation.

I used the above code and get an error like below; _init_xavier = tf.Variable(init(shape=shape)) NameError: name 'shape' is not defined — Chiranga, Oct 13 '20 at 17:01

Delip · Answer 4 · 2015-11-14T17:43:35.287

@Aleph7, Xavier/Glorot initialization depends the number of incoming connections (fan_in), number outgoing connections (fan_out), and kind of activation function (sigmoid or tanh) of the neuron. See this: http://jmlr.org/proceedings/papers/v9/glorot10a/glorot10a.pdf

So now, to your question. This is how I would do it in TensorFlow:

(fan_in, fan_out) = ...
    low = -4*np.sqrt(6.0/(fan_in + fan_out)) # use 4 for sigmoid, 1 for tanh activation 
    high = 4*np.sqrt(6.0/(fan_in + fan_out))
    return tf.Variable(tf.random_uniform(shape, minval=low, maxval=high, dtype=tf.float32))

Note that we should be sampling from a uniform distribution, and not the normal distribution as suggested in the other answer.

Incidentally, I wrote a post yesterday for something different using TensorFlow that happens to also use Xavier initialization. If you're interested, there's also a python notebook with an end-to-end example: https://github.com/delip/blog-stuff/blob/master/tensorflow_ufp.ipynb

That paper studies the behavior of weight gradients under different activation functions with the commonly used initialization. Then they propose a universal initialization regardless of any activation function. Furthermore, your method also does not depend on activation function either, so it's better to use the built-in Xavier initialization in Tensorflow. — Vahid Mirjalili, Mar 17 '17 at 14:06

score 8 · Answer 5 · answered Dec 19 '15 at 22:25

A nice wrapper around tensorflow called prettytensor gives an implementation in the source code (copied directly from here):

def xavier_init(n_inputs, n_outputs, uniform=True):
  """Set the parameter initialization using the method described.
  This method is designed to keep the scale of the gradients roughly the same
  in all layers.
  Xavier Glorot and Yoshua Bengio (2010):
           Understanding the difficulty of training deep feedforward neural
           networks. International conference on artificial intelligence and
           statistics.
  Args:
    n_inputs: The number of input nodes into each output.
    n_outputs: The number of output nodes for each input.
    uniform: If true use a uniform distribution, otherwise use a normal.
  Returns:
    An initializer.
  """
  if uniform:
    # 6 was used in the paper.
    init_range = math.sqrt(6.0 / (n_inputs + n_outputs))
    return tf.random_uniform_initializer(-init_range, init_range)
  else:
    # 3 gives us approximately the same limits as above since this repicks
    # values greater than 2 standard deviations from the mean.
    stddev = math.sqrt(3.0 / (n_inputs + n_outputs))
    return tf.truncated_normal_initializer(stddev=stddev)

score 8 · Answer 6 · answered May 01 '17 at 04:00

TF-contrib has xavier_initializer. Here is an example how to use it:

import tensorflow as tf
a = tf.get_variable("a", shape=[4, 4], initializer=tf.contrib.layers.xavier_initializer())
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    print sess.run(a)

In addition to this, tensorflow has other initializers:

thanks, sir this was very helpful, I want to ask you if I can initialize the **bias** using **xavier_initializer** — Sakhri Houssem, Feb 11 '18 at 19:41

score 4 · Answer 7 · answered Nov 12 '15 at 19:23

I looked and I couldn't find anything built in. However, according to this:

http://andyljones.tumblr.com/post/110998971763/an-explanation-of-xavier-initialization

Xavier initialization is just sampling a (usually Gaussian) distribution where the variance is a function of the number of neurons. tf.random_normal can do that for you, you just need to compute the stddev (i.e. the number of neurons being represented by the weight matrix you're trying to initialize).

Vince you should be sampling from a uniform distribution. – Delip Nov 14 '15 at 17:44 — Delip, Nov 14 '15 at 17:44

score 4 · Answer 8 · answered Apr 12 '18 at 13:57

Via the kernel_initializer parameter to tf.layers.conv2d, tf.layers.conv2d_transpose, tf.layers.Dense etc

e.g.

layer = tf.layers.conv2d(
     input, 128, 5, strides=2,padding='SAME',
     kernel_initializer=tf.contrib.layers.xavier_initializer())

https://www.tensorflow.org/api_docs/python/tf/layers/conv2d

https://www.tensorflow.org/api_docs/python/tf/layers/conv2d_transpose

https://www.tensorflow.org/api_docs/python/tf/layers/Dense

score 3 · Answer 9 · answered May 08 '18 at 19:19

3

Just in case you want to use one line as you do with:

W = tf.Variable(tf.truncated_normal((n_prev, n), stddev=0.1))

You can do:

W = tf.Variable(tf.contrib.layers.xavier_initializer()((n_prev, n)))

answered May 08 '18 at 19:19

Tony Power

1,033
11
23

score 0 · Answer 10 · answered Oct 21 '20 at 00:22

0

Tensorflow 1:

W1 = tf.get_variable("W1", [25, 12288],
    initializer = tf.contrib.layers.xavier_initializer(seed=1)

Tensorflow 2:

W1 = tf.get_variable("W1", [25, 12288],
    initializer = tf.random_normal_initializer(seed=1))

answered Oct 21 '20 at 00:22

mruanova

6,351
6
37
55

How to do Xavier initialization on TensorFlow

10 Answers10

Linked