I like to make a pytorch calculation reproducible without storing a large random vector for each step. I tried to first generate a random seed and then re-seed the random number generator like this:
seed = torch.rand(1, dtype=torch.float64)
torch.manual_seed(seed) # re-seed, so we get the same vector as we would get when using a stored seed
torch.save(seed, "seedfile") # store the seed
myvector = torch.randn(myvector.shape)
This way I would only need to store a float to reproduce the result. But when I use this in a loop, I get always the same result inside the loop.
Explanation what I try to achieve: Let's say I generate a batch of images in a loop. Each image depends on an initialization vector. Now I can reproduce the image by storing the initialization vector and loading it when I want to re-do the calculation (e.g. with other hyper-parameters). But when the vector is random anyway, it is sufficient to store the random seed.
To do so, I currently generate a random seed (a float64 in that code) and then manually seed with it. The manual_seed
is not useful in the first run, but should not be a problem either. When I want to reproduce the image, I do not generate the manual seed with torch.rand
, but load the seed from a file. This way I need less than 1 kb (with torch.save which has some overhead, the actual data I need to store would be just 8 byte) instead of, e.g., 64 kb for storing the vector that is generated by
loaded_seed = torch.load("seedfile")
torch.manual_seed(loaded_seed)
myvector = torch.randn(myvector.shape)