Tensorflow-Keras reproducibility problem on Google Colab

Question

I have a simple code to run on Google Colab (I use CPU mode):

import numpy as np
import pandas as pd

## LOAD DATASET

datatrain = pd.read_csv("gdrive/My Drive/iris_train.csv").values
xtrain = datatrain[:,:-1]
ytrain = datatrain[:,-1]

datatest = pd.read_csv("gdrive/My Drive/iris_test.csv").values
xtest = datatest[:,:-1]
ytest = datatest[:,-1]

import tensorflow as tf
from tensorflow.keras.layers import Dense, Activation
from tensorflow.keras.utils import to_categorical

## SET ALL SEED

import os
os.environ['PYTHONHASHSEED']=str(66)

import random
random.seed(66)

np.random.seed(66)
tf.set_random_seed(66)

from tensorflow.keras import backend as K
session_conf = tf.ConfigProto(intra_op_parallelism_threads=1, inter_op_parallelism_threads=1)
sess = tf.Session(graph=tf.get_default_graph(), config=session_conf)
K.set_session(sess)

## MAIN PROGRAM

ycat = to_categorical(ytrain) 

# build model
model = tf.keras.Sequential()
model.add(Dense(10, input_shape=(4,)))
model.add(Activation("sigmoid"))
model.add(Dense(3))
model.add(Activation("softmax"))

#choose optimizer and loss function
model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])

# train
model.fit(xtrain, ycat, epochs=15, batch_size=32)

#get prediction
classes = model.predict_classes(xtest)

#get accuration
accuration = np.sum(classes == ytest)/len(ytest) * 100

I have read the setup to create a reproducibility code here Reproducible results using Keras with TensorFlow backend and I put all code in the same cell. But the result (e.g. the loss) is always different every time I run that cell (run the cell using shift + enter).

In my case, the result from the code above can be reproduced, if only:

I run using "runtime" > "restart and run all" or,
I put that code in a single file and run it using the command line (python3 file.py)

is there something I miss to make the result reproducible without restart the runtime?

score 4 · Accepted Answer · answered Aug 07 '19 at 18:14

4

You should also fix the seed for kernel_initializer in your Dense layers. So, your model will be like:

model = tf.keras.Sequential()
model.add(Dense(10, kernel_initializer=keras.initializers.glorot_uniform(seed=66), input_shape=(4,)))
model.add(Activation("sigmoid"))
model.add(Dense(3, kernel_initializer=keras.initializers.glorot_uniform(seed=66)))
model.add(Activation("softmax"))

answered Aug 07 '19 at 18:14

mohaghighat

1,293
17
29

thank you!, so every we initialize a layer with weight, we should set seed the kernel initializer, right? – malioboro Aug 08 '19 at 02:27
That's right. You need to fix the seed for any layer that has weights/biases (e.g., Dense, Conv2D). – mohaghighat Aug 08 '19 at 17:51
From [keras docs](https://keras.io/getting_started/faq/#how-can-i-obtain-reproducible-results-using-keras-during-development): Note that you don't have to set seeds for individual initializers in your code if you do the steps above, because their seeds are determined by the combination of the seeds set above. – Winston Oct 11 '22 at 17:02

Ali karimi · Answer 2 · 2022-08-19T08:52:35.107

I tried most of the solutions on the web and just the following codes worked for me :

seed=0
import os
os.environ['PYTHONHASHSEED'] = str(seed)
# For working on GPUs from "TensorFlow Determinism"
os.environ["TF_DETERMINISTIC_OPS"] = str(seed)
import numpy as np
np.random.seed(seed)
import random
random.seed(seed)
import tensorflow as tf
tf.random.set_seed(seed)

note that you should call this code before every run(at least for me)

if you want run your code on CPU:

    seed=0
    import os
    os.environ['PYTHONHASHSEED'] = str(seed)
    # For working on GPUs from "TensorFlow Determinism"
    os.environ['CUDA_VISBLE_DEVICE'] = ''
    import numpy as np
    np.random.seed(seed)
    import random
    random.seed(seed)
    import tensorflow as tf
    tf.random.set_seed(seed)

score 1 · Answer 3 · answered Oct 09 '19 at 19:02

I've tried to get Tensorflow 2.0 working reproducibly using Keras and Google Colab (CPU), with a version of the Iris dataset processing similar to that described above by @malioboro. This seems to work - might be useful:

# Install TensorFlow
try:
  # %tensorflow_version only exists in Colab.
  %tensorflow_version 2.x
except Exception:
  pass

# Setup repro section from Keras FAQ with TF1 to TF2 adjustments

import numpy as np
import tensorflow as tf
import random as rn

# The below is necessary for starting Numpy generated random numbers
# in a well-defined initial state.

np.random.seed(42)

# The below is necessary for starting core Python generated random numbers
# in a well-defined state.

rn.seed(12345)

# Force TensorFlow to use single thread.
# Multiple threads are a potential source of non-reproducible results.
# For further details, see: https://stackoverflow.com/questions/42022950/

session_conf = tf.compat.v1.ConfigProto(intra_op_parallelism_threads=1,
                                        inter_op_parallelism_threads=1)

# The below tf.set_random_seed() will make random number generation
# in the TensorFlow backend have a well-defined initial state.
# For further details, see:
# https://www.tensorflow.org/api_docs/python/tf/set_random_seed

tf.compat.v1.set_random_seed(1234)

sess = tf.compat.v1.Session(graph=tf.compat.v1.get_default_graph(), config=session_conf)
tf.compat.v1.keras.backend.set_session(sess)

# Rest of code follows ...
# Some adopted from: https://janakiev.com/notebooks/keras-iris/
# Some adopted from the question.
#
# Load Data
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder, StandardScaler

iris = load_iris()
X = iris['data']
y = iris['target']
names = iris['target_names']
feature_names = iris['feature_names']

# One hot encoding
enc = OneHotEncoder()
Y = enc.fit_transform(y[:, np.newaxis]).toarray()

# Scale data to have mean 0 and variance 1 
# which is importance for convergence of the neural network
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Split the data set into training and testing
X_train, X_test, Y_train, Y_test = train_test_split(
    X_scaled, Y, test_size=0.5, random_state=2)

n_features = X.shape[1]
n_classes = Y.shape[1]

## MAIN PROGRAM
from tensorflow.keras.layers import Dense, Activation 

# build model
model = tf.keras.Sequential()
model.add(Dense(10, input_shape=(4,)))
model.add(Activation("sigmoid"))
model.add(Dense(3))
model.add(Activation("softmax"))

#choose optimizer and loss function
model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])

# train
model.fit(X_train, Y_train, epochs=20, batch_size=32)

#get prediction
classes = model.predict_classes(X_test)

Tensorflow-Keras reproducibility problem on Google Colab

3 Answers3

Linked