2

Nvidia's NVLink accelerates data transfer between several GPUs on the same machine. I train large models on such a machine using PyTorch.

I see why NVLink would make model-parallel training faster, since one pass through a model will involve several GPUs.

But would it accelerate a data-parallel training process using DistributedDataParallel?

acl
  • 254
  • 1
  • 5
  • 13

1 Answers1

4

How does data-parallel training on k GPUs works?
You split your mini batch into k parts, each part is forwarded on a different GPU, and gradients are estimated on each GPU. However, (and this is super crucial) updating the weights must be synchronized between all GPUs. This is where NVLink becomes important for data-parallel training as well.

Shai
  • 111,146
  • 38
  • 238
  • 371
  • That makes sense, thanks. Is there a rule of thumb for how much faster training will be with NVLink, or does it depend completely on the situation? – acl Jan 18 '21 at 16:21
  • 1
    @AGLC the speed is affected by many parameters – Shai Jan 18 '21 at 17:18