4

I am training a large network like ResNet with very small batch size say 25. When I do that, I get a very low and oscillating GPU utilization. I have seen several posts regarding the low GPU utilization in PyTorch. However, they are suggesting either of the following:

“Increase the batchsize.”: But, this is not a computational choice and I want my batch size to be small.

“Increase the number of workers as dataloading might be the bottleneck.”: First of all dataloading is not the bottleneck as it takes much less time. Secondly, increasing the number of loaders increases the running time of my code. Third, low and oscillating GPU utilization persists even after increasing the number of loaders. Hence, this suggestion also does not apply.

“Set shuffle = False”: Again not a feasible solution as I have to shuffle my data somehow.

Do you have any other suggestion for more effective use of GPUs when we have small batchsize?

kko
  • 101
  • 1
  • 7
  • 2
    The GPU advantage really kicks in when you have large batch sizes. If you want to train with small batch sizes (because this is what you need to do to address some research question) when maybe you should look into parallelising your training over multiple machines. Do you have access to a CPU cluster? With batch size 25, there doesn't seem much point to bother with GPUs at all. – mbpaulus Jan 02 '18 at 13:59
  • I run my model in windows and I also have this problem. – Hong Cheng Jan 06 '20 at 05:35
  • I also have this issue – Taras Kucherenko Feb 11 '20 at 15:40

0 Answers0