2

I am trying to train a CNN model using Keras using Tensorflow backend. The problem is, it won't run using my GPU (i.e. there is no speed up than when previously using tensorflow on CPU), despite the fact that I installed tensorflow-gpu and not the normal tensorflow as the solution described in this link.

enter image description here

I have installed CUDA version 9.0 and cuDNN version 7.1, received no issues as this link described.

I have also made sure that tensorflow is able to detect my GPU

[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 18032619952595111467
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 3187841433
locality {
  bus_id: 1
}
incarnation: 7706357628903921514
physical_device_desc: "device: 0, name: GeForce GTX 1050, pci bus id: 0000:01:00.0, compute capability: 6.1"
]

After some further researching, I have found this link. And my script didn't load the local cuDNN libraries.

enter image description here

EDIT

Below is a screenshot of the nvidia-smi.exe, this is before and after I started the CNN training (i.e. no changes).

enter image description here

EDIT v2

Ok, I have made some little progress, turns out there is something weird with running the script in Spyder, so I ran it using python my_python_script.py in Command Prompt. I can say for sure that the GPU is recognized by tensorflow, as Keras has given jobs to the GPU. But, I still get 0% utilization

enter image description here

I don't know what else could be wrong. Please help.

Thanks in advance!

user2552108
  • 1,107
  • 3
  • 15
  • 30
  • Can you please try running `nvidia-smi` on terminal and see if there is any change in memory usage and processes when you run your cnn? – skr Mar 08 '18 at 03:58
  • @skr_robo I have made the edit, although I am not sure exactly how to interpret the results – user2552108 Mar 08 '18 at 04:25
  • My bad. I didn't notice that. Do you know what process corresponds to `PID 13636`? That looks like Keras CNN code. Also, can you verify `CUDA` installation by running `nvcc -V`? – skr Mar 08 '18 at 04:32
  • I think that is Spyder editor. And yes, `CUDA` is installed, the output of `nvcc -V` is `nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2017 NVIDIA Corporation Built on Fri_Sep__1_21:08:32_Central_Daylight_Time_2017 Cuda compilation tools, release 9.0, V9.0.176` – user2552108 Mar 08 '18 at 04:38
  • Spyder is extensively used to run Keras CNN code. Are you running your Keras CNN code using Spyder IDE? – skr Mar 08 '18 at 04:39
  • Yes, I am. Should I do it differently? – user2552108 Mar 08 '18 at 04:41
  • Would you like to discuss this on chat? – skr Mar 08 '18 at 04:43
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/166434/discussion-between-skr-robo-and-user2552108). – skr Mar 08 '18 at 04:44
  • Still nothing :/ – user2552108 Mar 08 '18 at 07:26
  • Start your program from a python script. Tensorflow always tells you, on which hardware it'll train the model. So either he tells you that he found your gpu or that he didn't found it. If there is nothing like this it's likely that tensorflow(cpu) is still isntalled and used somewhere. Also, check if LD_LIBRARY_PATH and CUDA_HOME environment variables are set. – dennis-w Mar 08 '18 at 07:41
  • @dennis-ec I have made sure all of those points you mentioned, please see my EDIT v2 – user2552108 Mar 08 '18 at 09:31
  • Seems like you installed tensorflow on windows. I have no ex perience with that. My last idea: Maybe your GPU memory is too limited for any graph operation from your model. How about try a simple and small model and see if the gpu is used. Also, you can get detailed device runtime information in tensorboard for benchmarking. Maybe you'll find something there. – dennis-w Mar 08 '18 at 10:04

1 Answers1

4

So the reason why the GPU wasn't being used is because the images were too big to be loaded, I have resized the training and validation images to be only 64x64, and now Keras uses my GPU, and the speedup is most satisfying.

Having said that, I am not sure why Keras or Tensorflow didn't throw an Out of Memory exception in the first place and spare me the days of searching the answer in Google.

user2552108
  • 1,107
  • 3
  • 15
  • 30
  • 1
    That doesn't make sense? If the images were truly too large, then you should certainly get an OOM error.... – Kevin Aug 15 '19 at 13:09