-1

I am performing deep learning on my machine which has 4 GPU's. During training, the third GPU is consistently lost (the error comes up "GPU lost" and the logs indicate it's this specific GPU). I am assuming it's a thermal issue and the GPU is becoming unseated.

Before I fix this hardware issue, I would like to continue using the 3 GPUs ('/gpu:0', '/gpu:1', '/gpu:3'). Is there a way to specific, in Keras, that these are the GPUs I want to use (or alternatively, ignore '/gpu:2')?

I have seen a lot on specifying GPU vs CPU usage and specifying one GPU on a multiple GPU machine but not this specific issue (isolated a number of specific GPUs).

halfer
  • 19,824
  • 17
  • 99
  • 186
GhostRider
  • 2,109
  • 7
  • 35
  • 53
  • Does this answer your question? [How do I select which GPU to run a job on?](https://stackoverflow.com/questions/39649102/how-do-i-select-which-gpu-to-run-a-job-on) – Daraan Dec 03 '22 at 14:28

1 Answers1

2

You can try to use CUDA_VISIBLE_DEVICES environ

import os
os.environ['CUDA_VISIBLE_DEVICES']="0,1,3"

Probably set this before importing keras/tf.

nickyfot
  • 1,932
  • 17
  • 25