I've been developing some Transformer models, and writing some fairly comprehensive unit tests. In each unit test class I've been doing:
def setUp(self):
torch.manual_seed(123)
and it has worked nicely. I've now been going through and added .to(device)
everywhere, and get reproducible results for both "cpu" and "cuda" (cuda is nvidia 3090). In other words the randomly generated nn.Linear()
weights and biases are the same for both devices.
But Dropout has turned up a problem. I already give an explicit seed before calling the forward function, because I call it more than once, like below (algo1
and algo2
are model classes, both deriving from nn.Module
, and both initialized with identical weights).
expected = torch.tensor([...])
...
torch.manual_seed(432)
res1 = algo1(x)
...
torch.manual_seed(432)
res2 = algo2(x)
...
torch.testing.assert_close(res1, res2) #Test for agreement
torch.testing.assert_close(res, expected) #Test for regression
(expected
was created from the res1
output on the first run.)
This works fine on "cpu". And the agreement test works on "cuda", meaning I get the desired reproducibility. My problem is that the generated random numbers are different. So the last line fails.
But this is not happening when the weights and biases are randomly generated: those random numbers are consistent between cpu and cuda.
Is there a way to make Dropout
behave the same way on both devices? If it is impossible, can someone give me some insight as to why? (I.e. why dropout is implemented differently to generating weights.)
I've tried putting torch.backends.cudnn.benchmark = False
at the top of the module, but it made no difference. And I've tried torch.use_deterministic_algorithms(True)
; it gave me an error telling me to set CUBLAS_WORKSPACE_CONFIG=:16:8
, which I did, but it also made no difference.
ADDITIONAL:
I've now tested on two machines:
Python 3.6.9, Pytorch 1.10.2 (Jan 2022 release), torch.version.cuda is 10.2, nvidia-smi reports 11.4 and driver version 470. GPU is a 1070.
Python 3.10.8, Pytorch 1.13.0+cu116 (Nov 2022 release), torch.version.cuda is 11.6, nvidia-smi reports 11.4, and driver version 470. GPU is 3090.
The cpu results are different between the two machines: again just for Dropout, not for weight initialization, and again in a reproducible way. (Not sure if due to OS, python version or pytorch version differences yet.)
The _cuda_results are identical across the two machines.
As a workaround, I will write conditional code with three different expected
arrays for this one unit test, and drop dropout for future tests.