CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling cublasCreate(handle)

Question

I got the following error when I ran my PyTorch deep learning model in Google Colab

/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in linear(input, weight, bias)
   1370         ret = torch.addmm(bias, input, weight.t())
   1371     else:
-> 1372         output = input.matmul(weight.t())
   1373         if bias is not None:
   1374             output += bias

RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`

I even reduced batch size from 128 to 64 i.e., reduced to half, but still, I got this error. Earlier, I ran the same code with a batch size of 128 but didn't get any error like this.

The error and answers seems to suggest indeed that somehow the GPU memory is full and it is not captured by standard safety protocals. I got the error when too many (notebook) python kernels where using the GPU at the same time. — F.Wessels, Dec 21 '21 at 13:57

score 29 · Answer 1 · edited Apr 18 '23 at 13:41

No, batch size does not matter in this case.

The most likely reason is that there is an inconsistency between number of labels and number of output units.

Try printing the size of the final output in the forward pass and check the size of the output

print(model.fc1(x).size())
Here fc1 would be replaced by the name of your model's last linear layer before returning

Make sure that label.size() is equal to prediction.size() before calculating the loss

And even after fixing that problem, you'll have to restart the GPU runtime (I needed to do this in my case when using a Colab GPU)

This GitHub issue comment might also be helpful.

score 23 · Answer 2 · answered Oct 02 '20 at 15:32

23

This error can actually be due to different reasons. It is recommended to debug CUDA errors by running the code on the CPU, if possible. If that’s not possible, try to execute the script via:

CUDA_LAUNCH_BLOCKING=1 python [YOUR_PROGRAM]

This will help you get the right line of code which raised the error in the stack trace so that you can resolve it.

answered Oct 02 '20 at 15:32

HLeb

591
5
10

Thanks @HLeb I ran my program using CUDA_LAUNCH_BLOCKING=1 however it outputs `RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`` why is it outputting a CUDA error? – haneulkim Mar 25 '21 at 07:04
That's strange. Try to run directly on CPU, that's usually the default. But might need to modify your code if GPU is prioritised. Depends on what you're executing. – HLeb Mar 28 '21 at 11:12

score 13 · Answer 3 · answered Jan 24 '21 at 22:51

13

Reducing batch size works for me and the training proceeds as planned.

answered Jan 24 '21 at 22:51

Frank Puk

163
2
7

score 6 · Answer 4 · edited Apr 18 '23 at 13:31

6

This error means "Resource allocation failed inside the cuBLAS library".

Decreasing the batch size solved the issue for me. You said you increased to 64 and it didn't help. Try 32, 8, 1, etc. as well.

Also, try running the same on your CPU to check if everything is fine with your tensors' shapes.

edited Apr 18 '23 at 13:31

TylerH

20,799
66
75
101

answered Sep 24 '20 at 05:50

Serhiy

4,357
5
37
53

score 4 · Answer 5 · edited Apr 18 '23 at 13:32

4

One cause of this problem may be when the number of label is not equal to the number of network output channels, i.e the number of output classes predicted. Adjust the output to match and it should fix the issue.

edited Apr 18 '23 at 13:32

TylerH

20,799
66
75
101

answered Mar 26 '21 at 06:09

Peter Pack

81
3

score 1 · Answer 6 · answered Feb 08 '22 at 20:09

1

I had the same problem while I don't know the reason to be exactly I know the cause, my last line of the NN.module was

 self.fc3 = nn.Linear(84, num_classes)

I changed my real num_classes to be 2 times as much but it did not change the value of the variable num_classes, this probably made a mistake when I was outputting the results somewhere.

after I fixed the value of num_classes it just worked out i recommend going over the numbers in your model again

answered Feb 08 '22 at 20:09

ntg7 gamer

19
1

1

This seems like a totally different issue (caused by some unreproducible error in your code), unrelated to the question at hand. – TylerH Apr 18 '23 at 13:33

score 0 · Answer 7 · answered Feb 06 '22 at 13:22

0

My model is to classify two classes with only one neuron in the last layer. I had this problem when the last layer is nn.Linear(512,1) in pytorch environment. But my label is just [0] or [1]. I solved this problem by adding the layer: nn.sigmoid()

answered Feb 06 '22 at 13:22

Yeally

1
1

score 0 · Answer 8 · answered Apr 15 '22 at 01:23

0

For a large-scale dataset, just delete the temple variables

for batch_idx, (x, target) in enumerate(train_dataloader):
    ...
    del x,target,loss,outputs

answered Apr 15 '22 at 01:23

shenci zeng

1

score 0 · Answer 9 · edited Apr 18 '23 at 13:33

0

Reducing the maximum sequence length for a model that has a limit (e.g. BERT) solved this error for me.

Also, I faced the same issue when I resized the embedding layer of a model: model.resize_token_embeddings(NEW_SIZE), trained, and saved it.

At prediction time, when I loaded the model, I needed to resize the embedding layer again!

edited Apr 18 '23 at 13:33

TylerH

20,799
66
75
101

answered Aug 17 '22 at 15:44

Minions

5,104
5
50
91

CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling cublasCreate(handle)

9 Answers9

Linked