0

I am trying to run a python code on a specific GPU on our server. The server has four GPUs. When I run the code using a virtual environment installed with python 3.8 and tensorflow 2.2, it works correctly on the specific GPU just by adding the below few lines at the first of the script.

import os
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "2"
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "2"  # run the code on a specified GPU

Many threads recommend use the above code to run python scripts on a specific GPU such as here and here.

However, When I tried to use the same way to run another python code on another virtual environment (with lower specifications) that was installed with python version 3.6.9 and tensorflow 1.12, it does not run on the GPU but on the CPU.

How can I run python code on a specific GPU in the case of the second virtual environment?

Mohsen Ali
  • 655
  • 1
  • 9
  • 30

1 Answers1

1

You can use export CUDA_VISIBLE_DEVICES to define which GPUs are visible to the application. For example, if you want GPUs 0 and 2 visible, use export CUDA_VISIBLE_DEVICES=0,2.

sakumoil
  • 602
  • 4
  • 11
  • It does not work. Export gives another error that the "path" is not a valid identifier! – Mohsen Ali Jan 17 '22 at 06:54
  • Can you paste the error here? – sakumoil Jan 17 '22 at 07:21
  • -bash: export: `algorithm/stack_predict.py': not a valid identifier -bash: export: `--load_model_dir=./results/stack_train_2022-01-14_18-46-46_467/': not a valid identifier – Mohsen Ali Jan 17 '22 at 07:25
  • What is your full command you run when you encounter this error? – sakumoil Jan 17 '22 at 07:32
  • export CUDA_VISIBLE_DEVICES = 0,1,2,3, python algorithm/stack_predict.py --load_model_dir=./results/stack_train – Mohsen Ali Jan 17 '22 at 07:36
  • Did you install tensorflow 1.12 via `pip install tensorflow==1.12.0` or `pip install tensorflow-gpu==1.12.0`? Try latter, if you didn't use that. – sakumoil Jan 17 '22 at 07:37
  • Also, you should obviously run `export CUDA_VISIBLE_DEVICES=0,1,2,3` first, and then run your python file in a separate command. – sakumoil Jan 17 '22 at 07:38
  • I installed tensorflow-gpu==1.12.0 and ran the commands separately. But I got this error: ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory – Mohsen Ali Jan 17 '22 at 07:44
  • Your CUDA version is not compatible with TF 1.12. See this table: https://www.tensorflow.org/install/source#gpu to find the correct CUDA and NVIDIA driver version and install them. – sakumoil Jan 17 '22 at 07:48