3

I have a project in which I cannot reproduce random numbers when I use numpy in combination with tensorflow. In the beginning of all my tests, I set

tf.set_random_seed(seed)
np.random.seed(seed)

I have been debugging, and when I use numpy and no TF, all results are reproducible. When I add the TF code, the random numbers stop being reproducible. When I use both TF and numpy, I get the following results:

  1. TF variables are initialized to the same value every time (OK)
  2. When I use np.random.RandomState() with a set seed instead of direct calls to np.random.uniform(), np.random.normal(), etc, results are reproducible (OK)
  3. When I use direct calls to np.random.uniform(), np.random.normal(), etc, results are not reproducible (NOT OK)

The difference between 2 and 3 makes me think that TF must be using numpy internally somewhere in order to generate random numbers. This sounds a bit strange and unexpected. I have only 1 main thread, so the difference is definitely not caused because of race conditions. Furthermore, even if TF uses np.random, this should not change the random numbers that I observe between runs in my project since the sequence of querying the random number generation is always the same.

What is even more strange is that the particular piece of TF code which makes results non-reproducible is computing and applying gradients, where I would not expect any random number generation to be needed. Note that I am comparing only the sampled random numbers rather than results from the network (since TF has some non-deterministic operations) and these random numbers are not affected in any way by results produced from training the net.

Sorry I am unable to post my code, but it is just too big and reducing it to a smaller sample will likely make the problem go away. Thus any suggestions how to debug further are welcome.

EDIT: I discovered that this happens only on GPU and does not occur on CPU.

niko
  • 1,128
  • 1
  • 11
  • 25
  • I have seen people complaining about that kind of issue before e.g. [issue #16889](https://github.com/tensorflow/tensorflow/issues/16889). Are you running the scripts on the same interpreter or do you spawn a new process each time? (I seem to remember someone saying running a new interpreter every time fixed the issue for them). Also, unfounded suggestion, have you tried setting NumPy seed _before_ importing TensorFlow? – jdehesa Apr 25 '18 at 15:17
  • Ah and [`random.seed`](https://docs.python.org/3/library/random.html#random.seed) just in case? :/ – jdehesa Apr 25 '18 at 15:19
  • Thanks for the suggestions. I spawn a new process every time and have `random.seed` set as well. I tried setting numpy's seed before importing tensorflow, but I still get the same problem. Just to confirm, I can still expect `np.random.seed` to generate numbers deterministically if I set it in one file of the project and call `np.random.uniform` in another file, right? – niko Apr 25 '18 at 15:41
  • Yes, `np.random.seed` sets the seed for the global NumPy's `RandomState`. As far as I have worked with it I have never found "suprising" behavior in NumPy's RNG ([just need to be careful with multiprocessing](https://stackoverflow.com/questions/12915177/same-output-in-different-workers-in-multiprocessing)). – jdehesa Apr 25 '18 at 15:46
  • Thanks, that's my understanding too. I am not using multiprocessing anywhere. Might be a bug in TF – niko Apr 25 '18 at 15:52
  • I think you have a typo: your last sentence has GPU twice – c2huc2hu Apr 25 '18 at 20:21
  • Oops, thanks for pointing it out. – niko Apr 26 '18 at 08:43
  • In the meantime, have you tried looking at https://github.com/NVIDIA/framework-determinism and the solutions there? – Peter O. Jul 11 '20 at 18:55

0 Answers0