0

I am working on text classification task using BERT model. I have fine-tuned my model and I want load and inference model in various machines. Most of the time it works just fine, but on one specific environment I am keep getting CUDA error: CUBLAS_STATUS_INVALID_VALUE error.

(torch, numpy, pytorch-transformer, python versions are all setted same)

Even more weird thing is, the error occurs only when I'm working on cuda. (It works fine on cpu)

So I tracked down the error, and I have reached the conclusion that this error comes from torch.matmul(tensor1, tensor2).

When I execute below, It gives me torch.Size([3]), which is correct.

device = 'cpu'
tensor1 = torch.randn(3, 4).to(device)
tensor2 = torch.randn(4).to(device)
torch.matmul(tensor1, tensor2).size()

But when I execute below, it throws an error

RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling cublasSgemv(handle, op, m, n, &alpha, a, lda, x, incx, &beta, y, incy)

device = 'cuda:0'
tensor1 = torch.randn(3, 4).to(device)
tensor2 = torch.randn(4).to(device)
torch.matmul(tensor1, tensor2).size()

+) I'm currently using python 3.9.16,

pytorch-transformers 1.2.0

torch 1.13.0

If it helps

SUM
  • 3
  • 1

1 Answers1

0

I think it was because the mismatch between cuda version and pytorch version. I still can't understand why it worked fine on cuda 11.2 and caused the problem on cuda 11.4, but when I uninstalled pytorch 1.13.0 and installed instead 1.12.1, the code worked as it supposed to.

SUM
  • 3
  • 1