I am a newbie in GPU based training and Deep learning models. I am running cDCGAN (Conditonal DCGAN) in tensorflow on my 2 Nvidia GTX 1080 GPU's. My data-set consists of around 32,0000 images with size 64*64 and 2350 class labels. My batch size is very small i.e. 10, as i was facing OOM error (OOM when allocating tensor with shape[32,64,64,2351]) with a large batch_size.
The training is very slow which i understand is down to the batch size (correct me if i am wrong). If i do help -n 1 nvidia-smi
, I get the following output.
The GPU:0
is mainly used, as the Volatile GPU-Util gives me around 0%-65% whereas GPU:1
is always 0%-3% max. Performance for GPU:0
is always in P2 whereas GPU:1
is mostly P8 or sometimes P2. I have the following questions.
1) Why is GPU:1 not used more than the current state and why it has mostly P8 Perf although not used?
2) Is this poor training process speed down to my batch size only or there can be some other reasons?
3) How Can I improve performance?
4) How can I avoid OOM error with a bigger batch size?
Edit 1:
Model Details are as follows
Generator:
I have 4 layers (fully connected, UpSampling2d-conv2d, UpSampling2d-conv2d, conv2d).
W1 is of the shape [X+Y, 16*16*128] i.e. (2450, 32768), w2 [3, 3, 128, 64], w3 [3, 3, 64, 32], w4 [[3, 3, 32, 1]] respectively
Discriminator
It has five layers (conv2d, conv2d, conv2d, conv2d, fully connected).
w1 [5, 5, X+Y, 64] i.e. (5, 5, 2351, 64), w2 [3, 3, 64, 64], w3 [3, 3, 64, 128], w4 [2, 2, 128, 256], [16*16*256, 1] respectively.