1

I try to run an inference using a cli to get the predictions from a detection and recognition model. With cuda10.2 it takes 15 mins for the inference to complete but I have cuda11.3 which takes 3 hours, I want to reduce this time. Note : My hardware does not support cuda10.2.

hence I have following packages installed,

  1. cudatoolkit 11.3.1 h2bc3f7f_2
  2. pytorch 1.10.0 py3.7_cuda11.3_cudnn8.2.0_0 pytorch
  3. torchvision 0.11.0 py37_cu113 pytorch

I get this error while I run the inference cli,

RuntimeError: CUDA out of memory. Tried to allocate 2.05 GiB (GPU 0; 5.81 GiB total capacity; 2.36 GiB already allocated; 1.61 GiB free; 2.38 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Tried :

  1. To change the batch_size both for detection and recognition

Kindly help!

Thank you.

talonmies
  • 70,661
  • 34
  • 192
  • 269
aarya
  • 83
  • 1
  • 8
  • The only thing you can do is to reduce the batch size progressively until the operation fits in the GPU. Anyway, if you don't get it to the minimum batch size, you may need to run the model on CPU (much slower) or look for smaller versions of the models you are using. Anyway, is the code you use for inference your own? One thing that has happened to me on occasion is that I have forgotten to put `with torch.no_grad()` when forwarding the model. This is important because gradients take up a lot of GPU memory. – Cuartero Jul 12 '22 at 10:33
  • @ÁlvaroCuarteroMontilla thank you for your response, yes the inference code is my own. I did not get this with torch.no_grad(). – aarya Jul 12 '22 at 12:30
  • So you have solved it? – Cuartero Jul 12 '22 at 13:28
  • Sorry @ÁlvaroCuarteroMontilla no, I did not solve, in fact i did not get your sentence, "with torch.no_grad() when forwarding the model" – aarya Jul 12 '22 at 14:25
  • Can you provide your whole code so i can help you better? – Cuartero Jul 12 '22 at 15:32

1 Answers1

2

I use the following solutions whenever I encounter "CUDA out of memory" error. Here are the solutions, from simple to hard:

1- Try to reduce the batch size. First, train the model on each datum (batch_size=1) to save time. If it works without error, you can try a higher batch size but if it does not work, you should look to find another solution.

2- Try to use a different optimizer since some optimizers require less memory than others. For instance, SGD requires less memory than Adam.

3- Try to use a simpler model with fewer parameters.

4- Try to divide the model into two (or more than two) separate parts. Then update each part's parameters separately in each epoch. Note that whenever you want to compute gradients and update parameters of one part, the parameters of the other part of the model should be frozen. This leads to a lower amount of RAM required in each step which may solve the problem.

5- Lastly, if none of the above solutions work, GPU computation cannot be used.

Note that you can use a combination of these solutions which may lead to preventing the error. For instance, perhaps using a smaller batch size and simpler optimizer may work in some situations.

  • 1
    I would add: 6 - Sometimes there are multiple GPU available, then you may want to make sure to use the one with the most free memory. – Lucas Meier Jul 13 '22 at 08:18
  • That is correct! You may also use all the GPUs in your training procedures. For instance, if you separate the model into parts, you can update the parameters of each part separately on different GPUs. However, a programmer will need to put in extra effort and work to implement such a solution. – sadegh arefizadeh Jul 13 '22 at 08:29