I am training a large network like ResNet with very small batch size say 25. When I do that, I get a very low and oscillating GPU utilization. I have seen several posts regarding the low GPU utilization in PyTorch. However, they are suggesting either of the following:
“Increase the batchsize.”: But, this is not a computational choice and I want my batch size to be small.
“Increase the number of workers as dataloading might be the bottleneck.”: First of all dataloading is not the bottleneck as it takes much less time. Secondly, increasing the number of loaders increases the running time of my code. Third, low and oscillating GPU utilization persists even after increasing the number of loaders. Hence, this suggestion also does not apply.
“Set shuffle = False”: Again not a feasible solution as I have to shuffle my data somehow.
Do you have any other suggestion for more effective use of GPUs when we have small batchsize?