-1

I'm doing text classification using deep neural network in keras following a tutorial, but when I run the following code for several times, I got slice different results.

For example, the test loss in the first run is 0.88815, and it is 0.89030 in the second run which is slightly higher. I wonder where is the randomness come from?

import keras
from keras.datasets import reuters


(x_train, y_train), (x_test, y_test) = reuters.load_data(num_words=None, test_split=0.2)
word_index = reuters.get_word_index(path="reuters_word_index.json")



print('# of Training Samples: {}'.format(len(x_train)))
print('# of Test Samples: {}'.format(len(x_test)))

num_classes = max(y_train) + 1
print('# of Classes: {}'.format(num_classes))

index_to_word = {}
for key, value in word_index.items():
    index_to_word[value] = key

print(' '.join([index_to_word[x] for x in x_train[0]]))
print(y_train[0])


from keras.preprocessing.text import Tokenizer

max_words = 10000

tokenizer = Tokenizer(num_words=max_words)
x_train = tokenizer.sequences_to_matrix(x_train, mode='binary')
x_test = tokenizer.sequences_to_matrix(x_test, mode='binary')

y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)


print(x_train[0])
print(len(x_train[0]))

print(y_train[0])
print(len(y_train[0]))


from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation

model = Sequential()
model.add(Dense(512, input_shape=(max_words,)))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes))
model.add(Activation('softmax'))



model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model.metrics_names)

batch_size = 32
epochs = 3

history = model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, verbose=1, validation_split=0.1)
score = model.evaluate(x_test, y_test, batch_size=batch_size, verbose=1)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

4 Answers4

2

This is the usual behavior of keras. See this discussion at github's keras repository issue list.

For example, in the fit function, the 9th argument is on shuffling. It is by default set to true. So, in each epoch, the data will be shuffled before running. This causes the value to change each time.

Setting a random seed, would help. But, still not exactly.

Gimhani
  • 1,318
  • 13
  • 23
1

If you want to get each time the same result you need to add a random seed. See also https://machinelearningmastery.com/reproducible-results-neural-networks-keras/.

This can be done by just adding:

from numpy.random import seed
seed(42)

And in case you are using Tensorflow backend you also need to add:

from tensorflow import set_random_seed
set_random_seed(42)

The 42 is just an arbitrary number you can choose at your will. This is just a constant for the random seed so that you will always get the same random initialisations for your weights. This will then cause to give you the same results.

WurzelseppQX
  • 520
  • 1
  • 6
  • 17
0

As mentioned in Keras FAQ add the following code:

import numpy as np
import tensorflow as tf
import random as rn

# The below is necessary in Python 3.2.3 onwards to
# have reproducible behavior for certain hash-based operations.
# See these references for further details:
# https://docs.python.org/3.4/using/cmdline.html#envvar-PYTHONHASHSEED
# https://github.com/keras-team/keras/issues/2280#issuecomment-306959926

import os
os.environ['PYTHONHASHSEED'] = '0'

# The below is necessary for starting Numpy generated random numbers
# in a well-defined initial state.

np.random.seed(42)

# The below is necessary for starting core Python generated random numbers
# in a well-defined state.

rn.seed(12345)

# Force TensorFlow to use single thread.
# Multiple threads are a potential source of
# non-reproducible results.
# For further details, see: https://stackoverflow.com/questions/42022950/which-seeds have-to-be-set-where-to-realize-100-reproducibility-of-training-res

session_conf = tf.ConfigProto(intra_op_parallelism_threads=1, 
inter_op_parallelism_threads=1)

from keras import backend as K

# The below tf.set_random_seed() will make random number generation
# in the TensorFlow backend have a well-defined initial state.
# For further details, see: https://www.tensorflow.org/api_docs/python/tf/set_random_seed

tf.set_random_seed(1234)

sess = tf.Session(graph=tf.get_default_graph(), config=session_conf)
K.set_session(sess)

# Rest of code follows ...
Vikranth
  • 11
  • 2
0

I do not check with GPU, but for CPU It seems not to work with fixing seed as above with Tensorflow 1 as Keras back end. Therefore, we need to change Tensorflow 1 to Tensorflow 2. Then, the fixing seeds will work. For example, this works for me.

import os
import numpy as np
import random as rn
import tensorflow as tf

os.environ['PYTHONHASHSEED']= '0'
np.random.seed(1)
rn.seed(1)
tf.set_random_seed(1)
Frightera
  • 4,773
  • 2
  • 13
  • 28