Can GPU supports multiple jobs without delay?

Question

So I am running PyTorch deep learning job using GPU but the job is pretty light.

My GPU has 8 GB but the job only uses 2 GB. Also GPU-Util is close to 0%.

|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1080    Off  | 00000000:01:00.0  On |                  N/A |
|  0%   36C    P2    45W / 210W |   1155MiB /  8116MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

based on GPU-Util and memory I might be able to fit in another 3 jobs.

However, I am not sure if that will affect the overall runtime.

If I run multiple jobs on same GPU does that affects the overall runtime?

I think tried once and I think there was delay.

score 0 · Accepted Answer · answered Jul 29 '19 at 22:48

0

Yes you can. One option is to use NVIDIA's Multi-Process Service (MPS) to run four copies of your model on the same card.

This is the best description I have found of how to do it: How do I use Nvidia Multi-process Service (MPS) to run multiple non-MPI CUDA applications?

If you are using your card for inference only, then you can host several models (either copies, or different models) on the same card using NVIDIA's TensorRT Inference Service.

answered Jul 29 '19 at 22:48

jmsinusa

1,584
1
13
21

I am pretty upset that I have to do something explicit to get it working but I guess, it is only option. Thanks – Brandon Lee Jul 30 '19 at 00:49

Can GPU supports multiple jobs without delay?

1 Answers1