12

I only have one GPU (Titan X Pascal, 12 GB VRAM) and I would like to train multiple models, in parallel, on the same GPU.

I tried encapsulated my model in a single python program (called model.py), and I included code in model.py to restrict VRAM usage (based on this example). I was able to run up to 3 instances of model.py concurrently on my GPU (with each instance taking a little less than 33% of my VRAM). Mysteriously, when I tried with 4 models I received an error:

2017-09-10 13:27:43.714908: E tensorflow/stream_executor/cuda/cuda_dnn.cc:371] coul d not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2017-09-10 13:27:43.714973: E tensorflow/stream_executor/cuda/cuda_dnn.cc:338] coul d not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM 2017-09-10 13:27:43.714988: F tensorflow/core/kernels/conv_ops.cc:672] Check failed : stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNon fusedAlgo<T>(), &algorithms) Aborted (core dumped)

I later observed on the tensorflow Github that people seem to think that it is unsafe to have more than one tensorflow process running per GPU. Is this true, and is there an explanation for why this is the case? Why was I able to have 3 tensorflow processes running on the same GPU and not 4?

Adamo
  • 181
  • 1
  • 7
  • 2
    This is not the case that TensorFlow is optimized for. IE, all the testing and usage in Google is done by having only single TensorFlow process per-GPU. This makes it likely for there to be bugs in this scenario. Even if you get it to run, I expect it to have a significant cost penalty -- ie, running 2 TF processes in parallel on single GPU will be significantly slower than running them in sequence – Yaroslav Bulatov Sep 10 '17 at 20:13
  • Strangely enough this is not the case (at least in the experiments that I have run). For example, in the case of 3 processes, each process took ~11% longer than the case with a single process with identical VRAM usage. – Adamo Sep 10 '17 at 20:37
  • I see, I suspect that GPU is not the bottleneck in such situation (ie, GPU utilization is low) – Yaroslav Bulatov Sep 10 '17 at 20:46
  • 1
    "Why was I able to have 3 tensorflow processes running on the same GPU and not 4?" you said yourself that each instance take "a little less than 33%" of the GPU memory; seems that you are just running out of video memory with 4 processes (I have seen similar errors myself due to low memory). – jdehesa Sep 11 '17 at 09:00
  • Actually, TF will run just fine in multiple instances on the same device (as long as resources are available, of course). The only thing you might want to take care of is setting [`gpu_options.allow_growth=True`](https://www.tensorflow.org/tutorials/using_gpu#allowing_gpu_memory_growth) to prevent TF from allocating most of your GPU's RAM by default when you create a Session – GPhilo Sep 11 '17 at 09:46
  • @jdehesa I adjusted the memory usage to be a little less than 25% (per process) when I tried with 4. I don't think it's a memory error, I've gotten those before and they explicitly state that memory could not be allocated. Sorry for the confusion in my original post, I hope this clears things up. – Adamo Sep 11 '17 at 16:51
  • @GPhilo I don't necessarily want to allow memory growth, as this can make the processes run slower in some cases. I would prefer to allocate the memory in predetermined blocks when I run my processes. Do you have a citation/link that explains that it is safe to run multiple tensorflow processes on the same gpu? I can't find any "official" information that points one way or the other. – Adamo Sep 11 '17 at 16:57
  • @Adamo Have you checked whether other things are also taking GPU memory? For example, Xorg? Assuming you are using nvidia, you can use nvidia-smi to check. If there are, your memory allocation may be too aggressive. – Joshua Chia Nov 30 '18 at 05:27
  • I did check, nothing else was using GPU memory. – Adamo May 24 '20 at 22:51

2 Answers2

6

In short: yes it is safe to run multiple procceses on the same GPU (as of May 2017). It was previously unsafe to do so.

Link to tensorflow source code that confirms this

Adamo
  • 181
  • 1
  • 7
-2

Answer

Depending on video memory size, it will be allowed or not.

For my case I have total video memory of 2GBs while the single instance reserves about 1.4GB. When I have tried to run another tensorflow code while I was running already the speech recognition training.

2018-08-28 08:52:51.279676: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1405] Found device 0 with properties:
name: GeForce 940MX major: 5 minor: 0 memoryClockRate(GHz): 1.2415
pciBusID: 0000:01:00.0
totalMemory: 2.00GiB freeMemory: 1.65GiB
2018-08-28 08:52:51.294948: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1484] Adding visible gpu devices: 0
2018-08-28 08:52:55.643813: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-08-28 08:52:55.647912: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:971]      0
2018-08-28 08:52:55.651054: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:984] 0:   N
2018-08-28 08:52:55.656853: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1409 MB memory) -> physical GPU (device: 0, name: GeForce 940MX, pci bus id: 0000:01:00.0, compute
capability: 5.0)

I got the following error in speech recogntion, which completely terminated the script: (I think according to this is related to out of video memory)

2018-08-28 08:53:05.154711: E T:\src\github\tensorflow\tensorflow\stream_executor\cuda\cuda_driver.cc:1108] could not synchronize on CUDA context: CUDA_ERROR_LAUNCH_FAILED ::
Traceback (most recent call last):
  File "C:\Python35\lib\site-packages\tensorflow\python\client\session.py", line 1278, in _do_call
    return fn(*args)
  File "C:\Python35\lib\site-packages\tensorflow\python\client\session.py", line 1263, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "C:\Python35\lib\site-packages\tensorflow\python\client\session.py", line 1350, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InternalError: GPU sync failed
FindOutIslamNow
  • 1,169
  • 1
  • 14
  • 33
  • 4
    Did your program fail simply because it ran out of GPU memory? If so, that's not what the original question is about. In the context of the original question, the programs already can run with the restricted GPU memory individually and the total allocation sum to less than 100%. – Joshua Chia Nov 30 '18 at 05:23