Forcing PyTorch to keep some memory aside

Question

I am using Yolov3 by Ultralytics (PyTorch) to detect the behavior of cows in a video. The Yolov3 was trained to detect each individual cow in the video. Each image of the cow is cropped using the X and Y coordinates of the bounding box. Each image then goes through another model to determine whether they are sitting or standing. The second model was also trained with our own dataset. The second model uses Tensorflow and it's a very simple InceptionV3 model.

However, whenever I try to load both models, I am getting the following errors

RuntimeError: CUDA out of memory. Tried to allocate 50.00 MiB (GPU 0; 16.00 GiB total capacity; 427.42 MiB already allocated; 7.50 MiB free; 448.00 MiB reserved in total by PyTorch) 
If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

If the second model is not loaded then the yolov3 (PyTorch) runs without any issue and it does not even use the whole 16GB of VRAM. Is the yolov3 is reserving the whole VRAM and not leaving anything for the tensorflow-based Inceptionv3? If yes, anyway of forcing torch to keep 2 GB VRAM aside?

Full code output here

>> python detectv2.py --weights best.pt --source outch06_20181022073801_0_10.avi


2022-06-01 16:02:40.975544: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-06-01 16:02:41.342394: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 14123 MB memory:  -> device: 0, name: Quadro RTX 5000, pci bus id: 0000:65:00.0, compute capability: 7.5
detectv2: weights=['best.pt'], source=outch06_20181022073801_0_10.avi, imgsz=[640, 640], conf_thres=0.25, iou_thres=0.45, max_det=1000, device=, view_img=False, save_txt=False, save_conf=False, save_crop=False, nosave=False, classes=None, agnostic_nms=False, augment=False, visualize=False, update=False, project=runs\detect, name=exp, exist_ok=False, line_thickness=3, hide_labels=False, hide_conf=False, half=False, dnn=False
Empty DataFrame
Columns: []
Index: []
YOLOv3  2022-5-16 torch 1.11.0 CUDA:0 (Quadro RTX 5000, 16384MiB)

Fusing layers...
Model Summary: 269 layers, 62546518 parameters, 0 gradients
Traceback (most recent call last):
  File "detectv2.py", line 462, in <module>
    main(opt)
  File "detectv2.py", line 457, in main
    run(**vars(opt))
  File "C:\Users\sourav\Anaconda3\envs\yl37\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "detectv2.py", line 221, in run
    model(torch.zeros(1, 3, *imgsz).to(device).type_as(next(model.model.parameters())))  # warmup
  File "C:\Users\sourav\Anaconda3\envs\yl37\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\sourav\yolov3-master\models\common.py", line 357, in forward
    y = self.model(im) if self.jit else self.model(im, augment=augment, visualize=visualize)
  File "C:\Users\sourav\Anaconda3\envs\yl37\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\sourav\yolov3-master\models\yolo.py", line 127, in forward
    return self._forward_once(x, profile, visualize)  # single-scale inference, train
  File "C:\Users\sourav\yolov3-master\models\yolo.py", line 150, in _forward_once
    x = m(x)  # run
  File "C:\Users\sourav\Anaconda3\envs\yl37\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\sourav\yolov3-master\models\common.py", line 48, in forward_fuse
    return self.act(self.conv(x))
  File "C:\Users\sourav\Anaconda3\envs\yl37\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\sourav\Anaconda3\envs\yl37\lib\site-packages\torch\nn\modules\conv.py", line 447, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "C:\Users\sourav\Anaconda3\envs\yl37\lib\site-packages\torch\nn\modules\conv.py", line 444, in _conv_forward
    self.padding, self.dilation, self.groups)
 RuntimeError: CUDA out of memory. Tried to allocate 50.00 MiB (GPU 0; 16.00 GiB total capacity; 427.42 MiB already allocated; 7.50 MiB free; 448.00 MiB reserved in total by PyTorch) 
If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I think you better your batch size input and computational graph input as pytorch automatically alloacates memory according to input size and batch size — Edwin Cheong, Jun 02 '22 at 03:36

score 0 · Answer 1 · answered Jun 02 '22 at 16:29

PyTorch did not occupy the GPU as I thought. It was the other way around. I was trying to initiate the TensorFlow model first which occupied the whole memory and did not leave anything for PyTorch.

The solution is here

For tensorflow 2.2+

For single GPU

import tensorflow as tf
gpu = tf.config.experimental.list_physical_devices('GPU')[0]
tf.config.experimental.set_memory_growth(gpu, True)

Details can be found in this post and this documentation.

Forcing PyTorch to keep some memory aside

1 Answers1