19

The documentation states:

Deterministic mode can have a performance impact, depending on your model.

My question is, what is meant by performance here. Processing speed or model quality (i.e. minimal loss)? In other words, when setting manual seeds and making the model perform in a deterministic way, does that cause longer training time until minimal loss is found, or is that minimal loss worse than when the model is non-deterministic?

For completeness' sake, I manually make the model deterministic by setting all of these properties:

def set_seed(seed):
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
    np.random.seed(seed)
    random.seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)
Bram Vanroy
  • 27,032
  • 24
  • 137
  • 239

2 Answers2

12

Performance refers to the run time; CuDNN has several ways of implementations, when cudnn.deterministic is set to true, you're telling CuDNN that you only need the deterministic implementations (or what we believe they are). In a nutshell, when you are doing this, you should expect the same results on the CPU or the GPU on the same system when feeding the same inputs. Why would it affect the performance? CuDNN uses heuristics for the choice of the implementation. So, it actually depends on your model how CuDNN will behave; choosing it to be deterministic may affect the runtime because their could have been, let's say, faster way of choosing them at the same point of running.


Concerning your snippet, I do the exact seeding, it has been working good (in terms of reproducibility) for 100+ DL experiments.

ndrwnaguib
  • 5,623
  • 3
  • 28
  • 51
2

"performance" in this context refer to run-time

Shai
  • 111,146
  • 38
  • 238
  • 371
  • 4
    Do you have any sources for this? How can you be sure? – Bram Vanroy May 29 '19 at 08:03
  • 2
    When enabling random changes between different training sessions you obviously won't result with exactly the same weights and therefore not *exactly* the same loss/accuracy. However, these differences are minute (see e.g., [this](https://arxiv.org/pdf/1905.10854.pdf)). On the other hand, requiring exactly the same numerical results require to carry the training process in exactly the same manner. This takes **time**: Fixing the order of parallel computations does not allow you to enjoy load balancing etc. Thus, in terms of accuracy/loss you are in roughly the same spot, but at longer time.@Bra – Shai May 29 '19 at 08:12
  • 2
    Interestingly, here is a claim that random seeds can have huge repercussions. Here you'll see that the same model with a different seed has a 10% accuracy difference. https://www.linkedin.com/posts/nlp-town_sentimentanalysis-camembert-xlm-activity-6605379961111007232-KJy3 – Bram Vanroy Dec 09 '19 at 14:26