Does NVLink accelerate training with DistributedDataParallel?

Question

Nvidia's NVLink accelerates data transfer between several GPUs on the same machine. I train large models on such a machine using PyTorch.

I see why NVLink would make model-parallel training faster, since one pass through a model will involve several GPUs.

But would it accelerate a data-parallel training process using DistributedDataParallel?

score 4 · Accepted Answer · answered Jan 18 '21 at 15:53

4

How does data-parallel training on k GPUs works?
You split your mini batch into k parts, each part is forwarded on a different GPU, and gradients are estimated on each GPU. However, (and this is super crucial) updating the weights must be synchronized between all GPUs. This is where NVLink becomes important for data-parallel training as well.

answered Jan 18 '21 at 15:53

Shai

111,146
38
238
371

That makes sense, thanks. Is there a rule of thumb for how much faster training will be with NVLink, or does it depend completely on the situation? – acl Jan 18 '21 at 16:21
1

@AGLC the speed is affected by many parameters – Shai Jan 18 '21 at 17:18

Does NVLink accelerate training with DistributedDataParallel?

1 Answers1