Make nn.Dropout exactly reproducible, for same seed, between cpu and cuda devices

Question

I've been developing some Transformer models, and writing some fairly comprehensive unit tests. In each unit test class I've been doing:

  def setUp(self):
    torch.manual_seed(123)

and it has worked nicely. I've now been going through and added .to(device) everywhere, and get reproducible results for both "cpu" and "cuda" (cuda is nvidia 3090). In other words the randomly generated nn.Linear() weights and biases are the same for both devices.

But Dropout has turned up a problem. I already give an explicit seed before calling the forward function, because I call it more than once, like below (algo1 and algo2 are model classes, both deriving from nn.Module, and both initialized with identical weights).

  expected = torch.tensor([...])
  ...
  torch.manual_seed(432)
  res1 = algo1(x)
  ...
  torch.manual_seed(432)
  res2 = algo2(x)
  ...
  torch.testing.assert_close(res1, res2)  #Test for agreement
  torch.testing.assert_close(res, expected) #Test for regression

(expected was created from the res1 output on the first run.)

This works fine on "cpu". And the agreement test works on "cuda", meaning I get the desired reproducibility. My problem is that the generated random numbers are different. So the last line fails.

But this is not happening when the weights and biases are randomly generated: those random numbers are consistent between cpu and cuda.

Is there a way to make Dropout behave the same way on both devices? If it is impossible, can someone give me some insight as to why? (I.e. why dropout is implemented differently to generating weights.)

I've tried putting torch.backends.cudnn.benchmark = False at the top of the module, but it made no difference. And I've tried torch.use_deterministic_algorithms(True); it gave me an error telling me to set CUBLAS_WORKSPACE_CONFIG=:16:8, which I did, but it also made no difference.

ADDITIONAL:

I've now tested on two machines:

Python 3.6.9, Pytorch 1.10.2 (Jan 2022 release), torch.version.cuda is 10.2, nvidia-smi reports 11.4 and driver version 470. GPU is a 1070.

Python 3.10.8, Pytorch 1.13.0+cu116 (Nov 2022 release), torch.version.cuda is 11.6, nvidia-smi reports 11.4, and driver version 470. GPU is 3090.

The cpu results are different between the two machines: again just for Dropout, not for weight initialization, and again in a reproducible way. (Not sure if due to OS, python version or pytorch version differences yet.)

The _cuda_results are identical across the two machines.

As a workaround, I will write conditional code with three different expected arrays for this one unit test, and drop dropout for future tests.

I've added cuda tag back in, as this question is very clearly (in my mind) about the intersection of cuda and unit testing. If you want to remove it again can you leave a comment explaining why and what tag should be used instead, thanks! — Darren Cook, Feb 22 '23 at 21:03
Did you set `torch.backends.cudnn.deterministic` flag to `True'? — Caridorc, Feb 22 '23 at 21:13
@Caridorc Just tried it (in isolation) but no difference in results. Also tried it in conjunction with `benchmark=False` and `use_deterministic_algorithms(True)`, but same. — Darren Cook, Feb 22 '23 at 23:00
You are asking about the behaviour of a GPU accelerated Pytorch API. That isn’t a CUDA programming question, its a Pytorch programming question. More reductively, can you point to anything in [here](https://docs.nvidia.com/cuda/index.html) your question directly relates to, or is it found [here](https://github.com/pytorch/pytorch) or [here](https://pytorch.org/docs/stable/index.html)? If it is the latter (and I would wager it is), then you don’t have a CUDA programming question, you have a Pytorch programming question — talonmies, Feb 23 '23 at 00:37

Caridorc · Answer 1 · 2023-02-22T23:41:41.427

1

You cannot have deterministic dropout in older versions of Pytorch because of a bug (the suggested solution of torch.cuda.set_rng_state does not work):

You must update to the newest version as the bug should be fixed now.

You can run this test, taken from the bug report above, to check if you are in the correct version:

import torch
import torch.nn as nn
seed = 1
model= nn.Dropout(0.5)
use_cuda = True

for i in range(3):
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    model.train()
    data = torch.randn(4, 4)
    if i > 0:
        print(torch.equal(data, pre_data))
    pre_data = data
    if use_cuda:
        data = data.cuda()
    out = model(data)
    loss = out.sum()
    print(i, loss.item())

Where the output in the updated version should be:

0 -7.718717575073242
True
1 -7.718717575073242
True
2 -7.718717575073242

While it is random in the older version.

edited Feb 22 '23 at 23:41

answered Feb 22 '23 at 23:14

Caridorc

6,222
2
31
46

Thank-you for the example code, as it actually shows the problem! I get `-7.718717575073242` when use_cuda is True, and -4.47562313079834 when use_cuda is False. What I am after is a solution where both devices give the same value. Generating random model weights is equal for cpu and cuda. – Darren Cook Feb 23 '23 at 08:42
@DarrenCook I also get the `-4.47562313079834` with use cuda set to false. Have tried installing the latest version of pytorch and testing this again? – Caridorc Feb 23 '23 at 12:51
The bug you mentioned was fixed in Oct 2018, and was for before pytorch 1.0; I'm on pytorch 1.10 (Jan 2022) on this machine. I'll test on another machine later, and will then update the question with version numbers. – Darren Cook Feb 23 '23 at 14:34
The cpu gives `-10.50810432434082` for a new version of python/pytorch (see my edit). The cuda is still the `-7.7` one. – Darren Cook Feb 23 '23 at 17:52
@DarrenCook ah so weird, maybe determinism is only guaranteed in the same device and not between devices – Caridorc Feb 24 '23 at 12:16

Make nn.Dropout exactly reproducible, for same seed, between cpu and cuda devices

1 Answers1