1

So after I realize there are operations in TensorFlow which are non-deterministic, see this Question: How to get the same loss value, every time training a CNN (MNIST data set), with TensorFlow?, I want to know:

How can I build a convolutional neural net with:

  • TensorFlow Version 1.1.0
  • CUDA release 8.0, V8.0.61
  • cuDNN 5.1.10
  • run on GPU

which use only deterministic operations?

talonmies
  • 70,661
  • 34
  • 192
  • 269
Milan
  • 256
  • 2
  • 17
  • I am curious why you would need a deterministic training. Debugging is definitely a valid reason, do you have something else in mind? – P-Gn Jun 28 '17 at 11:20
  • @user1735003 yes, debugging and I want to compare the results from models with different parameters. For a powerful compare I need strong data, but with non-deterministic operations I get different results for same input and same model, so not strong data. – Milan Jun 28 '17 at 13:48

1 Answers1

0

You can't until every operation on cuDNN is not completely deterministic. Moreover, even moving every operation on the GPU, if it uses (and it probably does) SSE instructions you could obtain different results executing the same identical (and without randomness) code more than once.

The best thing you can do to get close to your goal, it to set the seed for every operation that contains randomness (the op-level seed) and set the seed for the whole graph (tf.set_random_seed(value)).

nessuno
  • 26,493
  • 5
  • 83
  • 74
  • 1) Some cuDNN operations are/can be deterministic, the question is which one. 2) Why would the use of SSE produce non-deterministic outputs? – P-Gn Jun 28 '17 at 13:25
  • 1) It's the gradient operation itself that for convolution operations is not deterministic in cuDNN 2) Read here: http://blog.nag.com/2011/02/wandering-precision.html – nessuno Jun 28 '17 at 13:38
  • 1) that depends on the algorithm that you choose, cuDNN let you pick backprop algorithms that are deterministic. 2) read also here: https://stackoverflow.com/questions/15147174/is-sse-floating-point-arithmetic-reproducible – P-Gn Jun 28 '17 at 13:39
  • 1) That's true. However OP should change the TF source code and recompile it (and also has the downside of the slowdown, but I guess he/she doesn't mind). 2) Ok is well defined, but not every compiler produces well-aligned code (although they should, the article I linked you tells the opposite) – nessuno Jun 28 '17 at 14:12
  • Thanks to @nessuno and [at]user1735003 helps me to understand more. So there is actually no way to get deterministic results for a CNN? For feed forward nets it does but not for convolutional neural nets. – Milan Jun 28 '17 at 14:26