0

I would like to get reproducible results for my tensorflow runs. The way I'm trying to make this happen is to set up the numpy and tensorflow seeds:

import numpy as np
rnd_seed = 1
np.random.seed(rnd_seed)

import tensorflow as tf
tf.set_random_seed(rnd_seed)

As well as make sure that the weights of the neural network, that I initialized with tf.truncated_normal also use that seed: tf.truncated_normal(..., seed=rnd_seed)

For reasons that are beyond the scope of this question, I'm using the sampled softmax loss function, tf.nn.sampled_softmax_loss, and unfortunately, I'm not able to control the stochasticity of this function with a random seed.

By a look at the TensorFlow documentation of this function (https://www.tensorflow.org/api_docs/python/tf/nn/sampled_softmax_loss), I can see that parameter sampled_values should be the only parameter that affects randomization, but I'm not able to understand how to actually use a seed.

[EDITED] This is (part of) my script

import numpy as np
# set a seed so that the results are consistent
rnd_seed = 1
np.random.seed(rnd_seed)

import tensorflow as tf
tf.set_random_seed(rnd_seed)

embeddings_ini = np.random.uniform(low=-1, high=1, size=(self.vocabulary_size, self.embedding_size))

with graph.as_default(), tf.device('/cpu:0'):

    train_dataset = tf.placeholder(tf.int32, shape=[None, None])
    train_labels = tf.placeholder(tf.int32, shape=[None, 1])
    valid_dataset = tf.constant(self.valid_examples, dtype=tf.int32)

    # Variables.
    initial_embeddings = tf.placeholder(tf.float32, shape=(self.vocabulary_size, self.embedding_size))
    embeddings = tf.Variable(initial_embeddings)

    softmax_weights = tf.Variable(
        tf.truncated_normal([self.vocabulary_size, self.embedding_size],
                            stddev=1.0 / math.sqrt(self.embedding_size), seed=rnd_seed))
    softmax_biases = tf.Variable(tf.zeros([self.vocabulary_size]))

    # Model.
    # Look up embeddings for inputs.
    if self.model == "skipgrams":
        # Skipgram model
        embed = tf.nn.embedding_lookup(embeddings, train_dataset)
    elif self.model == "cbow":
        # CBOW Model
        embeds = tf.nn.embedding_lookup(embeddings, train_dataset)
        embed = tf.reduce_mean(embeds, 1, keep_dims=False)

    # Compute the softmax loss, using a sample of the negative labels each time.
    loss = tf.reduce_mean(tf.nn.sampled_softmax_loss(weights=softmax_weights,
                                                     biases=softmax_biases,
                                                     inputs=embed,
                                                     labels=train_labels,
                                                     num_sampled=self.num_sampled,
                                                     num_classes=self.vocabulary_size))
Brian
  • 13,996
  • 19
  • 70
  • 94
  • 1
    Can you provide the full script so we can see what the possible sources on uncontrolled randomness may be? – Anis Aug 25 '17 at 11:04
  • 1
    The idea of the `sampled_values` parameter is that you pass the output of one of the `*_candidate_sampler` functions (you can look them up [here](https://www.tensorflow.org/api_docs/python/tf/nn), although they are not grouped into a common section or anything). But if you use [`tf.set_random_seed`](https://www.tensorflow.org/api_docs/python/tf/set_random_seed) it should be reproducible even if you don't pass any. Can you confirm you are [setting the seed within the graph](https://stackoverflow.com/questions/36288235/how-to-get-stable-results-with-tensorflow-setting-random-seed)? – jdehesa Aug 25 '17 at 11:10
  • I agree. `_compute_sampled_logits` doesn't pass any seed to the candidate sample, so it all comes down to your graph's seed. – Anis Aug 25 '17 at 11:16
  • @Anis I edited the post and included part of my script that should be relevant to the question. Do you have any idea what is going wrong? – Brian Aug 29 '17 at 08:26
  • Have you had a look at what we were saying about setting the graph's seed? – Anis Aug 29 '17 at 10:29
  • I have but I can't fully understand what it means. Does it mean making sure that the `seed` argument is used wherever I can? What about initializing the seed inside the graph? Is it something that would make a difference? – Brian Aug 30 '17 at 12:41

1 Answers1

0

I finally, found out how to make results reproducible. Just like @Anis suggested I should've set the graph seed and this can be done by:

with graph.as_default(), tf.device('/cpu:0'):
    tf.set_random_seed(1234)
Brian
  • 13,996
  • 19
  • 70
  • 94