In order to fine tune Starcoder LLM model on my GCP instance, I have setup 4 NVIDIA Tesla T4 GPUs (16GB each)
I installed nvitop
to monitor the usage of the GPUs while finetuning.
I have also installed the CUDA toolkit on the VM. (checked if it's installed using nvcc --version
)
The problem is that all the computation is currently happening in 1 GPU instance only (GPU0), Which is why when the model requires more than 16GB, it gives a CUDA OutofMemory Error.
How do I ensure that all 4 GPUs are load balanced ? Is there any additional configuration that needs to be done at VM level ?
I'm new to this, please provide assistance. Thanks in advance
OutOfMemoryError: CUDA out of memory. Tried to allocate 144.00 MiB (GPU 0; 14.62 GiB total capacity; 13.16 GiB already allocated;
103.38 MiB free; 13.96 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb
to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF```