I have an NVidia RTX 3060 (6GB VRAM) which I am trying to fine tune a CLIP model with, but I am running out of memory
I have a custom library telling me the model weighs approximately 350 MB with 177 M trainable parameters. My dataset, containing 200 rows, is less than 100 MB. However, when I launch the training of the model with a batch size of 64, the VRAM gradually reaches maximum capacity within 5 training steps (batch passes), and raises the following error:
OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 6.00 GiB total capacity; 5.20 GiB already allocated; 0 bytes free; 5.32 GiB reserved in total by PyTorch)
The training works with batch size 32 but still saturates VRAM.
Isn't the allocated memory supposed to reach around [model size + batch size] ? Or does the gradient computation take that much memory ? If so, is there a relationship between model size and batch size (in bytes) that can help me determine a trade-off between these two ?
Also I have tried solutions proposed on other threads such as this one, but torch.cuda.empty_cache()
only clears part of the memory, and over 3GB remain allocated after running this command. I also want to have as big a batch size as possible.