Do I really need MPS when running multiple MPI ranks on a single GPU, or Kepler's Hyper-Q itself is enough?

Question

Basically I would like to run multiple MPI ranks on a single GPU (NVidia K20c), and I am aware of the existence of MPS and Kepler's Hyper-Q.

However, my question is, is Hyper-Q itself enough for my need? Or I have to use MPS? According to the above Hyper-Q link, "No extra coding effort is necessary to enable Hyper-Q. All it takes is a Tesla K20 GPU with a CUDA 5 installation and setting an environment variable to let multiple MPI ranks share the GPU – Hyper-Q is then ready to use."

Does this mean that I don't need MPS at all?

p.s., I am also aware of the following question on a similar topic, but it seems that doesn't answer my question clearly. Do I have to use the MPS (MULTI-PROCESS SERVICE) when using CUDA6.5 + MPI?

Thanks.

score 4 · Accepted Answer · answered Oct 17 '14 at 20:35

4

You can run multiple MPI ranks without MPS on a single GPU. In that case, all the rank (GPU) code will serialize. A given rank's GPU code will only begin to execute when the GPU code associated with the previous rank has completely finished and exited the GPU.

If you want to have any opportunity for the GPU code from one rank to execute concurrently with the GPU code from another rank, then MPS will be necessary. If the GPU code associated with a rank makes full usage of the GPU, then you're not likely to see much benefit from MPS. The significant benefit will be observed with the rank GPU code can execute concurrently with the GPU code of another rank.

answered Oct 17 '14 at 20:35

Robert Crovella

143,785
11
213
257

Thanks for the quick response. But I am still not clear what role Hyper-Q is playing. According the above link, it seems that Hyper-Q itself will "provide 32 work queues between the host and the GPU, enabling multiple MPI processes to run concurrently on the GPU", while Fermi architecture will serialize the execution of multiple ranks. If Hyper-Q also serialize all ranks, then what is the point of the Hyper-Q? Or Hyper-Q is just the hardware feature, and it has to work with MPS together to enable the parallelism, which is an important point that is omitted by the original Hyper-Q post? Thanks! – rsm Oct 17 '14 at 23:25
3

There is a difference between CUDA tasks coming from a single process and CUDA tasks coming from multiple processes. Hyper-Q removes some artificial barriers to concurrency for requests emanating from a single process. But requests from multiple processes still serialize, due to CUDA behavior unrelated to Hyper-Q. MPS acts as a "funnel" to collect CUDA tasks from multiple processes/ranks and issue them to the GPU as if they came from a single process, so that Hyper-Q can take effect. – Robert Crovella Oct 18 '14 at 03:50

Do I really need MPS when running multiple MPI ranks on a single GPU, or Kepler's Hyper-Q itself is enough?

1 Answers1