Results not reproducible with Keras and TensorFlow in Python

Question

I have the problem, that I am not able to reproduce my results with Keras and ThensorFlow.

It seems like recently there has been a workaround published on the Keras documentation site for this issue but somehow it doesn't work for me.

What I am doing wrong?

I'm using a Jupyter Notebook on a MBP Retina (without Nvidia GPU).

# ** Workaround from Keras Documentation **

import numpy as np
import tensorflow as tf
import random as rn

# The below is necessary in Python 3.2.3 onwards to
# have reproducible behavior for certain hash-based operations.
# See these references for further details:
# https://docs.python.org/3.4/using/cmdline.html#envvar-PYTHONHASHSEED
# https://github.com/fchollet/keras/issues/2280#issuecomment-306959926

import os
os.environ['PYTHONHASHSEED'] = '0'

# The below is necessary for starting Numpy generated random numbers
# in a well-defined initial state.

np.random.seed(42)

# The below is necessary for starting core Python generated random numbers
# in a well-defined state.

rn.seed(12345)

# Force TensorFlow to use single thread.
# Multiple threads are a potential source of
# non-reproducible results.
# For further details, see: https://stackoverflow.com/questions/42022950/which-seeds-have-to-be-set-where-to-realize-100-reproducibility-of-training-res

session_conf = tf.ConfigProto(intra_op_parallelism_threads=1, inter_op_parallelism_threads=1)

from keras import backend as K

# The below tf.set_random_seed() will make random number generation
# in the TensorFlow backend have a well-defined initial state.
# For further details, see: https://www.tensorflow.org/api_docs/python/tf/set_random_seed

tf.set_random_seed(1234)

sess = tf.Session(graph=tf.get_default_graph(), config=session_conf)
K.set_session(sess)


# ** Workaround end **

# ** Start of my code **


# LSTM and CNN for sequence classification in the IMDB dataset
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers.embeddings import Embedding
from keras.preprocessing import sequence
from sklearn import metrics
# fix random seed for reproducibility
#np.random.seed(7)

# ... importing data and so on ...

# create the model
embedding_vecor_length = 32
neurons = 91
epochs = 1
model = Sequential()
model.add(Embedding(top_words, embedding_vecor_length, input_length=max_review_length))
model.add(LSTM(neurons))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='mean_squared_logarithmic_error', optimizer='adam', metrics=['accuracy'])
print(model.summary())
model.fit(X_train, y_train, epochs=epochs, batch_size=64)
# Final evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=0)
print("Accuracy: %.2f%%" % (scores[1]*100))

Used Python version:

Python 3.6.3 |Anaconda custom (x86_64)| (default, Oct  6 2017, 12:04:38) 
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]

The workaround is already included in the code (without effect).

With everytime I do the training part I get different results.

When resetting the kernel of the Jupyter Notebook, 1st time corresponds with the first time and 2nd time with 2nd time.

So after resetting I will always get for example 0.7782 at the first run, 0.7732 on the second run etc.

But results without kernel reset are always different each time I run it.

I would be helpful for any suggestion!

Can you add `np.random.get_state()` and `rn.getstate()` to the output? Do you use GPU or CPU? Can you try the script in `python`? — Maxim, Oct 20 '17 at 16:48
Please have a look at my response here (https://stackoverflow.com/a/52897216/9024698) for when using the CPU. — Outcast, Oct 22 '18 at 09:56
@PoeteMaudit Thank you for this answer. I wanted to respond to your answer, but then it was gone:) This Q here is quite old, actually I already stumbled over this doc entry you posted (but forgot to report it here). It did indeed succeed in getting reproducible results, however it is extremely slow with only a single thread, also to make it work in Jupyter Notebooks it was necessary to restart the kernel every time (I haven't found a way to reset the seed manually without reloading all data, but I don't know if that changed since then). Just wanted to mention these two points. Still thank you! — MBT, Oct 22 '18 at 10:09
Thank you for your comment! It is very good that you mention all these things. I am going to have them in mind. In my case it was not slow at all with `Python 3.6` and `PyCharm`. Anyways, I was very pleased to see that I was getting the exact same results...haha...see you for now! — Outcast, Oct 22 '18 at 10:29
I have posted a answer to [a similar question](https://stackoverflow.com/a/57121117/9501391) (link https://stackoverflow.com/a/57121117/9501391). And it has been accepted. The key point to reproduce the same result is to disable GPU. Hope it can solve your problem. — guorui, Jul 20 '19 at 02:16

score 8 · Answer 1 · answered Oct 23 '17 at 09:45

I had exactly the same problem and managed to solve it by closing and restarting the tensorflow session every time I run the model. In your case it should look like this:

#START A NEW TF SESSION
np.random.seed(0)
tf.set_random_seed(0)
sess = tf.Session(graph=tf.get_default_graph())
K.set_session(sess)

embedding_vecor_length = 32
neurons = 91
epochs = 1
model = Sequential()
model.add(Embedding(top_words, embedding_vecor_length, input_length=max_review_length))
model.add(LSTM(neurons))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='mean_squared_logarithmic_error', optimizer='adam', metrics=['accuracy'])
print(model.summary())
model.fit(X_train, y_train, epochs=epochs, batch_size=64)
# Final evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=0)
print("Accuracy: %.2f%%" % (scores[1]*100))

#CLOSE TF SESSION
K.clear_session()

I ran the following code and had reproducible results using GPU and tensorflow backend:

print datetime.now()
for i in range(10):
    np.random.seed(0)
    tf.set_random_seed(0)
    sess = tf.Session(graph=tf.get_default_graph())
    K.set_session(sess)

    n_classes = 3
    n_epochs = 20
    batch_size = 128

    task = Input(shape = x.shape[1:])
    h = Dense(100, activation='relu', name='shared')(task)
    h1= Dense(100, activation='relu', name='single1')(h)
    output1 = Dense(n_classes, activation='softmax')(h1)

    model = Model(task, output1)
    model.compile(loss='categorical_crossentropy', optimizer='Adam')
    model.fit(x_train, y_train_onehot, batch_size = batch_size, epochs=n_epochs, verbose=0)
print(model.evaluate(x=x_test, y=y_test_onehot, batch_size=batch_size, verbose=0))
K.clear_session()

And obtained this output:

2017-10-23 11:27:14.494482
0.489712882132
0.489712893813
0.489712892765
0.489712854426
0.489712882132
0.489712864011
0.486303713004
0.489712903398
0.489712892765
0.489712903398

What I understood is that if you don't close your tf session (you are doing it by running in a new kernel) you keep sampling the same "seeded" distribution.

Seems like there are still some differences in the score even though it is close. Right now I need to make a ranking depending on the probability, so even small differences matter (therefore I switched to Theano backend for this purpose). But thank you! I will try what results I get. — MBT, Oct 23 '17 at 20:09
Thanks a lot !! after trying all the solutions on the internet, yours worked perfectly — user239457, Jan 24 '19 at 18:41

score 2 · Answer 2 · answered May 11 '19 at 10:16

My answer is the following, which uses Keras with Tensorflow as backend. Within your nested for loop, where one typically iterates through the various parameters you wish to explore for your model's development, immediately add this function after your last for loop.

for...
   for...
      reset_keras()
      .
      .
      .

where the reset function is defined as

def reset_keras():
    sess = tf.keras.backend.get_session()
    tf.keras.backend.clear_session()
    sess.close()
    sess = tf.keras.backend.get_session()
    np.random.seed(1)
    tf.set_random_seed(2)

PS: The function above also actually avoids your nvidia GPU from building up too much memory (which happens after many iteration) so that it eventually becomes very slow...so the function restores GPU performance and maintains results as reproducible.

score 0 · Answer 3 · answered Apr 03 '19 at 14:28

Looks like a bug in TensorFlow / Keras not sure. When setting the Keras back-end to CNTK the results are reproducible.

I even tried with several versions of TensorFlow from 1.2.1 till 1.13.1. All the TensorFlow versions results doesn't agree with multiple runs even when the random seeds are set.

score 0 · Answer 4 · answered Mar 26 '20 at 10:51

The thing that worked for me was to run the training every time in a new console. In addition to this I also have this parameters set:

RANDOM_STATE = 42

os.environ['PYTHONHASHSEED'] = str(RANDOM_STATE)
random.seed(RANDOM_STATE)
np.random.seed(RANDOM_STATE)
tf.set_random_seed(RANDOM_STATE)

session_conf = tf.ConfigProto(intra_op_parallelism_threads=1, inter_op_parallelism_threads=1)
sess = tf.Session(graph=tf.get_default_graph(), config=session_conf)
K.set_session(sess)

intra_op_parallelism could also be a bigger value

Results not reproducible with Keras and TensorFlow in Python

4 Answers4

Linked