0

In trying to learn a bit about Tensorflow, I had been building a Variational Auto Encoder, which is working, however I noticed that, after training, I was getting different results from the decoders which are sharing the same variables.

I created two decoders, because the first I train against my dataset, the second I want to eventually feed a new Z encoding in order to produce new values.

My check is that I shoud be able to send the Z values generated from the encoding process to both decoders and get equal results.

I have 2 Decoders (D, D_new). D_new shares the variable scope from D.

before training, I can send values into the Encoder (E) to generate output values as well as the Z values it generated (Z_gen).

if I use Z_gen as input to D_new before training then its output is identical to the output of D, which is expected.

After a few iterations of training, however, the output of D compared with D_new begins to diverge (although they are quite similar).

I have paired this down to a more simple version of my code which still reproduces the error. I'm wondering if others have found this to be the case and where I might be able to correct for it.

The below code can be run in a jupyter notebook. I'm using Tensorflow r0.11 and Python 3.5.0



    import numpy as np
    import tensorflow as tf
    import matplotlib
    import matplotlib.pyplot as plt
    import os
    import pylab as pl
    mgc = get_ipython().magic
    mgc(u'matplotlib inline')
    pl.rcParams['figure.figsize'] = (8.0, 5.0)



    ##-- Helper function Just for visualizing the data
    def plot_values(values, file=None):
        t = np.linspace(1.0,len(values[0]),len(values[0]))
        for i in range(len(values)):
            plt.plot(t,values[i])
        if file is None:
            plt.show()
        else:
            plt.savefig(file)
        plt.close()



    def encoder(input, n_hidden, n_z):
        with tf.variable_scope("ENCODER"):
            with tf.name_scope("Hidden"):
                n_layer_inputs = input.get_shape()[1].value
                n_layer_outputs = n_hidden
                with tf.name_scope("Weights"):
                    w = tf.get_variable(name="E_Hidden", shape=[n_layer_inputs, n_layer_outputs], dtype=tf.float32)
                with tf.name_scope("Activation"):
                    a = tf.tanh(tf.matmul(input,w))
                prevLayer = a

            with tf.name_scope("Z"):
                n_layer_inputs = prevLayer.get_shape()[1].value
                n_layer_outputs = n_z
                with tf.name_scope("Weights"):
                    w = tf.get_variable(name="E_Z", shape=[n_layer_inputs, n_layer_outputs], dtype=tf.float32)
                with tf.name_scope("Activation"):
                    Z_gen = tf.matmul(prevLayer,w)
        return Z_gen

    def decoder(input, n_hidden, n_outputs, reuse=False):
        with tf.variable_scope("DECODER", reuse=reuse):
            with tf.name_scope("Hidden"):
                n_layer_inputs = input.get_shape()[1].value
                n_layer_outputs = n_hidden
                with tf.name_scope("Weights"):
                    w = tf.get_variable(name="D_Hidden", shape=[n_layer_inputs, n_layer_outputs], dtype=tf.float32)
                with tf.name_scope("Activation"):
                    a = tf.tanh(tf.matmul(input,w))
                prevLayer = a

            with tf.name_scope("OUTPUT"):
                n_layer_inputs = prevLayer.get_shape()[1].value
                n_layer_outputs = n_outputs
                with tf.name_scope("Weights"):
                    w = tf.get_variable(name="D_Output", shape=[n_layer_inputs, n_layer_outputs], dtype=tf.float32)
                with tf.name_scope("Activation"):
                    out = tf.sigmoid(tf.matmul(prevLayer,w))
        return out

Here is where the Tensorflow graph is setup:



    batch_size = 3
    n_inputs = 100
    n_hidden_nodes = 12
    n_z = 2

    with tf.variable_scope("INPUT_VARS"):
        with tf.name_scope("X"):
            X = tf.placeholder(tf.float32, shape=(None, n_inputs))
        with tf.name_scope("Z"):
            Z = tf.placeholder(tf.float32, shape=(None, n_z))

    Z_gen = encoder(X,n_hidden_nodes,n_z)

    D = decoder(Z_gen, n_hidden_nodes, n_inputs)
    D_new = decoder(Z, n_hidden_nodes, n_inputs, reuse=True)

    with tf.name_scope("COST"):
        loss = -tf.reduce_mean(X * tf.log(1e-6 + D) + (1-X) * tf.log(1e-6 + 1 - D))
        train_step = tf.train.AdamOptimizer(0.001, beta1=0.5).minimize(loss)

I'm generating a training set of 3 samples of normal distribution noise with 100 data points and then sort it to more easily visualize:



    train_data = (np.random.normal(0,1,(batch_size,n_inputs)) + 3) / 6.0
    train_data.sort()
    plot_values(train_data)

enter image description here

startup the session:



    sess = tf.InteractiveSession()
    sess.run(tf.group(tf.initialize_all_variables(), tf.initialize_local_variables()))

Lets just look at what the network initially generates before training...



    resultA, Z_vals = sess.run([D, Z_gen], feed_dict={X:train_data})
    plot_values(resultA)

enter image description here

Pulling the Z generated values and feeding them to D_new which is reusing the variables from D:



    resultB = sess.run(D_new, feed_dict={Z:Z_vals})
    plot_values(resultB)

enter image description here

Just for sanity I'll plot the difference between the two to be sure they're the same... enter image description here

Now run 1000 training epochs and plot the result...



    for i in range(1000):
        _, resultA, Z_vals = sess.run([train_step, D, Z_gen], feed_dict={X:train_data})
    plot_values(resultA)

enter image description here

Now lets feed those same Z values to D_new and plot those results...



    resultB = sess.run(D_new, feed_dict={Z:Z_vals})
    plot_values(resultB)

enter image description here

They look pretty similar. But (I think) they should be exactly the same. Let's look at the difference...



    plot_values(resultA - resultB)

enter image description here

You can see there is some variation now. This becomes much more dramatic with a larger network on more complex data, but still shows up in this simple example. Any clues as to what's going on?

P-Rod
  • 471
  • 1
  • 5
  • 18
  • I'm not sure that this is an actual answer, because I still see these results in my more complex example. But I find that if I sample the Z_gen values again, after the training loop, then the results from D and D_new match. When I try this with more complex data (a training set of sine waves) I still see some variation. It seems like there is some very minor change in values once the run() method completes. – P-Rod Nov 22 '16 at 23:11

2 Answers2

0

There are some methods (don't know which one specifically) which can be supplied with a seed value. Besides those, I'm not even sure if the training process is completely deterministic, especially when the GPU is involved, simply by the nature of parallelization.

See this question.

Community
  • 1
  • 1
Sven
  • 2,839
  • 7
  • 33
  • 53
0

While I don't have a full explanation for the reason why, I was able to resolve my issue by changing:

for i in range(1000):
    _, resultA, Z_vals = sess.run([train_step, D, Z_gen], feed_dict={X:train_data})
plot_values(resultA)

resultB = sess.run(D_new, feed_dict={Z:Z_vals})
plot_values(resultB)
plot_values(resultA - resultB)

to...

for i in range(1000):
    _, resultA, Z_vals = sess.run([train_step, D, Z_gen], feed_dict={X:train_data})

resultA, Z_vals = sess.run([D, Z_gen], feed_dict={X:train_data})

plot_values(resultA)

resultB = sess.run(D_new, feed_dict={Z:Z_vals})
plot_values(resultB)
plot_values(resultA - resultB)

Note, that I simply ran and extracted the result and Z_vals one last time, without the train_step.

The reason I was still seeing problems in my more complex setup was that I had bias variables (even though they were set to 0.0) that were being generated with...

b = tf.Variable(tf.constant(self.bias_k, shape=[n_layer_outputs], dtype=tf.float32))

And that is somehow not considered while using reuse with a tf.variable_scope. So there were variables technically not being reused. Why they presented such a problem when set to 0.0 I'm not sure.

P-Rod
  • 471
  • 1
  • 5
  • 18