1

I am using AWS to train a CNN on a custom dataset. I launched a p2.xlarge instance, uploaded my (Python) scripts to the virtual machine, and I am running my code via the CLI.

I activated a virtual environment for TensorFlow(+Keras2) with Python3 (CUDA 10.0 and Intel MKL-DNN), which was a default option via AWS.

I am now running my code to train the network, but it feels like the GPU is not 'activated'. The training goes just as fast (slow) as when I run it locally with a CPU.

This is the script that I am running:

https://github.com/AntonMu/TrainYourOwnYOLO/blob/master/2_Training/Train_YOLO.py

I also tried to alter it by putting with tf.device('/device:GPU: 0'): after the parser (line 142) and indenting everything underneath under there. However, this doesn't seem to have changed anything.

Any tips on how to activate the GPU (or check if the GPU is activated)?

Y.Ynot
  • 337
  • 5
  • 15

2 Answers2

2

Checkout this answer for listing available GPUs.

from tensorflow.python.client import device_lib

def get_available_gpus():
    local_device_protos = device_lib.list_local_devices()
    return [x.name for x in local_device_protos if x.device_type == 'GPU']

You can also use CUDA to list the current device and, if necessary, set the device.

import torch

print(torch.cuda.is_available())
print(torch.cuda.current_device())
justMiles
  • 553
  • 3
  • 11
  • Thanks for your answer! I am able to view the available GPU's, also by running `nvidia-smi`. So I do know that there is an available GPU, it is just not activate when I run my code. And that's exactly the problem I want to solve. – Y.Ynot Sep 02 '20 at 09:53
  • 2
    Do you get any errors [setting the device](https://pytorch.org/docs/stable/cuda.html#torch.cuda.set_device)? – justMiles Sep 02 '20 at 15:57
  • Hi Miles. Thanks again for you comment! I tried doing that, and I didn't get any errors. I did the following commands which had the following outputs: `torch.cuda.is_available()` --> True, `torch.cuda.is_initialized()` -->False , `torch.cuda.set_device(0)`, `torch.cuda.is_initialized()` --> True. However, the processing speed does not go up and `nvidia-smi` still gives me 'no running processes', unfortunately. – Y.Ynot Sep 03 '20 at 13:35
  • I also ran the first command you propsed, and weirdly it doesn't return a GPU. The output is: ```[name: "/device:CPU:0" device_type: "CPU" memory_limit: 268435456 locality { } incarnation: 15573112437867445376 , name: "/device:XLA_CPU:0" device_type: "XLA_CPU" memory_limit: 17179869184 locality { } incarnation: 9660188961145538128 physical_device_desc: "device: XLA_CPU device" ]``` – Y.Ynot Sep 03 '20 at 13:40
  • 1
    If the first command doesn't return anything the GPU isn't available to tensorflow. There can be a couple issues for this, but I would 1) check the the GPU is available to the OS: `lspci | grep VGA` should return the NVIDIA GPU. 2) check that the versions of tensorflow and cuda support your GPU. What AMI are you using? – justMiles Sep 08 '20 at 15:04
0

In the end it had to do with my tensorflow package! I had to uninstall tensorflow and install tensorflow-gpu. After that the GPU was automatically activated.

For documentation see: https://www.tensorflow.org/install/gpu

Y.Ynot
  • 337
  • 5
  • 15