3

I am writing a tensorflow.Keras wrapper to perform ML experiments.

I need my framework to be able to perform an experiment as specified in a configuration yaml file and run in parallel in a GPU.

Then I need a guarantee that if I ran the experiment again I would get if not the exact same results something reasonably close.

To try to ensure this, my training script contains these lines at the beginning, following the guidelines in the official documentation:

# Set up random seeds
random.seed(seed)
np.random.seed(seed)
tf.set_random_seed(seed)

This has proven to not be enough.

I ran the same configuration 4 times, and plotted the results:

enter image description here

As you can see, results vary a lot between runs.

How can I set up a training session in Keras to ensure I get reasonably similar results when training in a GPU? Is this even possible?

The full training script can be found here.

Some of my colleagues are using just pure TF, and their results seem far more consistent. What is more, they do not seem to be seeding any randomness except to ensure that the train and validation split is always the same.

Jsevillamol
  • 2,425
  • 2
  • 23
  • 46
  • 1
    Personally I never seed anything other than the train/test split. I am not familiar with your data and I can't say with certainty but in my experience where models most commonly fail in the training are 1. batch sizes and 2. optimizers and learning rate. Looking at your script, you are using an Adam optimizer which I personally adore but it can overshoot significantly if you don't get your learning rate right. At a glance, I'd say explore your data more to figure out the magnitudes of variations and play with batch sizes and learning rate(assuming there aren't bugs in your model somewhere). – Alexander Ejbekov Mar 16 '19 at 20:34

2 Answers2

7

Keras + Tensorflow.

Step 1, disable GPU.

import os
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = ""

Step 2, seed those libraries which are included in your code, say "tensorflow, numpy, random".

import tensorflow as tf
import numpy as np
import random as rn

sd = 1 # Here sd means seed.
np.random.seed(sd)
rn.seed(sd)
os.environ['PYTHONHASHSEED']=str(sd)

from keras import backend as K
config = tf.ConfigProto(intra_op_parallelism_threads=1,inter_op_parallelism_threads=1)
tf.set_random_seed(sd)
sess = tf.Session(graph=tf.get_default_graph(), config=config)
K.set_session(sess)

Make sure these two pieces of code are included at the start of your code, then the result will be reproducible.

guorui
  • 871
  • 2
  • 9
  • 21
  • can you explain why to disable the GPU? the gpus are crucial element of the training.... – Nathan B Jul 28 '20 at 13:09
  • 2
    @NadavB the docs explain it well: https://keras.io/getting_started/faq/#how-can-i-obtain-reproducible-results-using-keras-during-development. Specifically "Moreover, whenrunning on a GPU, some operations have non-deterministic outputs, in particular tf.reduce_sum(). This is due to the fact that GPUs run many operations in parallel, so the order of execution is not always guaranteed." – cddt Sep 25 '20 at 23:55
  • @cddt, interesting, but CPUs also run in parallel when you have more than one core, don't they? – Nathan B Sep 27 '20 at 09:51
  • @guirui `AttributeError: module 'tensorflow' has no attribute 'ConfigProto'` – jtlz2 Aug 10 '21 at 19:15
  • `AttributeError: module 'tensorflow' has no attribute 'set_random_seed'` – jtlz2 Aug 10 '21 at 19:18
  • `AttributeError: module 'tensorflow' has no attribute 'Session'` ... – jtlz2 Aug 10 '21 at 19:19
1

Try adding seed parameters to weights/biases initializers. Just to add more specifics to Alexander Ejbekov's comment.

Tensorflow has two random seeds graph level and op level. If you're using more than one graph, you need to specify seed in every one. You can override graph level seed with op level, by setting seed parameter within function. And you can make two functions even from different graphs output same value if same seed is set. Consider this example:

g1 = tf.Graph()
with g1.as_default():
    tf.set_random_seed(1)
    a = tf.get_variable('a', shape=(1,), initializer=tf.keras.initializers.glorot_normal())
    b = tf.get_variable('b', shape=(1,), initializer=tf.keras.initializers.glorot_normal(seed=2))
with tf.Session(graph=g1) as sess:
    sess.run(tf.global_variables_initializer())
    print(sess.run(a)) 
    print(sess.run(b))
g2 = tf.Graph()
with g2.as_default():
    a1 = tf.get_variable('a1', shape=(1,), initializer=tf.keras.initializers.glorot_normal(seed=1))

with tf.Session(graph=g2) as sess:
    sess.run(tf.global_variables_initializer())
    print(sess.run(a1))

In this example, output of a is the same as a1, but b is different.

Sharky
  • 4,473
  • 2
  • 19
  • 27
  • Thank you for your answer! How can I adapt this if I am using the `tensorflow.keras` interface instead of bare TF sessions? – Jsevillamol Mar 16 '19 at 21:02
  • you can set seed in your keras.layers, and you can access graph with `tf.get_default_graph()` – Sharky Mar 16 '19 at 21:29