Why is the accuracy of my CNN not reproducible?

Question

I want reproducible results for the CNNs I train. Hence I set the seed in my script:

import tensorflow as tf
tf.set_random_seed(0)  # make sure results are reproducible
import numpy as np
np.random.seed(0)  # make sure results are reproducible

The docs of set_random_seed and np.random.seed do not report any special behaviour for a seed of 0.

When I run the same script twice on the same machine within a couple of minutes and without making updates, I expected to get the same results. However, this is not the case:

Run 1:

0;0.001733;0.001313
500;0.390164;0.388188

Run 2:

0;0.006986;0.007000
500;0.375288;0.374250

How can I make the network produce reproducible results?

System

$ python -c "import tensorflow;print(tensorflow.__version__)"                
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
1.0.0

$ python -c "import numpy;print(numpy.__version__)"
1.12.0

TF applies some graph optimizations/pruning before running. I'm not sure if that procedure is deterministic. — Kh40tiK, Feb 19 '17 at 13:54
@Kh40tiK Do you know if there is a way to make it deterministic? Do you know where I could ask for an answer to this question? — Martin Thoma, Feb 19 '17 at 13:56
[TF mailing list](https://groups.google.com/a/tensorflow.org/forum/#!forum/discuss) or [github issues](https://github.com/tensorflow/tensorflow) if this is a missing feature. — Kh40tiK, Feb 19 '17 at 14:01
Somebody just had another idea why the results might not be reproducible: [Floating point multiplication is not associative](https://en.wikipedia.org/wiki/Associative_property#Nonassociativity_of_floating_point_calculation). It might be that the order in which the nodes compute their result differs between trainings. Hence the different results. — Martin Thoma, Mar 05 '17 at 12:21

Martin Thoma · Accepted Answer · 2017-12-12T21:19:17.423

While I didn't solve the problem, here are possible reasons why the results are not always the same (roughly ordered from most likely/easiest to fix to most unlikely/hardest to fix). I also try to give a solution after the problem.

Human error - you missread a number / made a typo when you copied a result from one shell to the paper: Logging. Create an 2017-12-31-23-54-experiment-result.log for every single experiment you run. Not manually, but the experiment creates it. Yes, the time stamp in the name for easier finding it again. All following should be logged to that file for each single experiment.
Code changed: Version control (e.g. git)
Configuration file changed: Version control
Pseudorandom number changed: set seed for random / tensorflow / numpy (yes, you might have to set more than one seed)
Data loading differently / in a different order: Version control + seed (is the preprocessing really the same?)
Environment variables changed: Docker
Software (version) changed: Docker
Driver (version) changed: Logging
Hardware changed: Logging
Hardware/Software has some reproducibility problems. Such as the fact that floating point multiplication is not associative and different cores on a GPU might finish computations at different times (I'm not sure about this)
Hardware has errors

In any case, running the "same" thing multiple times might help to get a gut feeling for how different things are.

Writing a paper

If you write a paper, I think the following would be the best practice for reproducibility:

Add a link to a repository (e.g. git) where all code is
The code has to be containerized (e.g. Docker)
If there is Python code and a requirements.txt you have to give the exact software version, not something like tensorflow>=1.0.0 but tensorflow==1.2.3
Add the git hash of the version you used for the experiments. It might be different hashes if you changed something in between.
Always log information about drivers (e.g. like this for nVidia) and hardware. Add this to the appendix of your paper. So in case of later changes one can at least check if there was a change which might cause numbers being different.

For logging the versions, you might want to use something like this:

#!/usr/bin/env python

# core modules
import subprocess


def get_logstring():
    """
    Get important environment information that might influence experiments.

    Returns
    -------
    logstring : str
    """
    logstring = []
    with open('/proc/cpuinfo') as f:
        cpuinfo = f.readlines()
    for line in cpuinfo:
        if "model name" in line:
            logstring.append("CPU: {}".format(line.strip()))
            break

    with open('/proc/driver/nvidia/version') as f:
        version = f.read().strip()
    logstring.append("GPU driver: {}".format(version))
    logstring.append("VGA: {}".format(find_vga()))
    return "\n".join(logstring)


def find_vga():
    vga = subprocess.check_output("lspci | grep -i 'vga\|3d\|2d'",
                                  shell=True,
                                  executable='/bin/bash')
    return vga


print(get_logstring())

which gives something like

CPU: model name    : Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
GPU driver: NVRM version: NVIDIA UNIX x86_64 Kernel Module  384.90  Tue Sep 19 19:17:35 PDT 2017
GCC version:  gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.5)
VGA: 00:02.0 VGA compatible controller: Intel Corporation Skylake Integrated Graphics (rev 06)
02:00.0 3D controller: NVIDIA Corporation GM108M [GeForce 940MX] (rev a2)

score 1 · Answer 2 · edited May 23 '17 at 11:53

1

Might be a scope problem. Make sure to set the seed within your scope in which you're using the graph, e.g. after

with tf.Graph().as_default()
 tf.set_random_seed(0)

This also has to be done after calling tf.reset_default_graph().

For a full example, see How to get stable results with TensorFlow, setting random seed

edited May 23 '17 at 11:53

Community

1
1

answered Feb 19 '17 at 13:34

whiletrue

10,500
6
27
47

After adding it after `tf.reset_default_graph`, the first epoch is consistent. However, the results differ after training a while. I made 5 runs and non of them had the same result, while all were quite similar: http://pastebin.com/7j4SPS5V – Martin Thoma Feb 19 '17 at 13:54

Why is the accuracy of my CNN not reproducible?

System

2 Answers2

Writing a paper