Detectron2 Segmentation training : out of memory while training the Detectron2 mask-rcnn model on GPU

Question

I tried almost all the option to train the model including reducing batch size to 1 and some other steps as described here How do I select which GPU to run a job on?, But still i get the error RuntimeError: CUDA out of memory. Tried to allocate 238.00 MiB (GPU 3; 15.90 GiB total capacity; 15.20 GiB already allocated; 1.88 MiB free; 9.25 MiB cached) This is the notebook , configured in Azure ML workspace with N24-GPU

thank you

score 0 · Answer 1 · answered Jun 23 '22 at 04:44

Check your memory usage before you start training, sometimes detectron2 doesn't free vram after use, particularly if training crashes. If this is the case, the easiest way to fix the issue in the short term is a reboot.

As for a long term fix to this issue, I cant give any advise other than ensuring your using the latest version of everything.

Detectron2 Segmentation training : out of memory while training the Detectron2 mask-rcnn model on GPU

1 Answers1