I am trying to train the U2PL (https://github.com/Haochen-Wang409/U2PL) method on my customed dataset, and came accross an OOM error when training with train_sup.py with image size = 320x320 and batch size = 4. I am using two GPUs.
"Tried to allocate 8.38 GiB (11.91 GiB total capacity; 1.28 GiB already allocated; 8.38 GiB free; 2.74 GiB reserved in total by PyTorch)"
Weird thing is that when I am training with either smaller or larger batch size, no OOM error. When I am training with larger or smaller image size, no OOM error either.
I am using the official code of U2PL and mixed precision was not used in the code.
I have no idea what's happening here. Would really appreciate some help. Thank you!