Error invalid value when using CUDA

Question

I am trying to run an example program I found here:

import numpy as np
from timeit import default_timer as timer
from numbapro import vectorize

@vectorize(["float32(float32, float32)"], target='gpu')
def VectorAdd(a, b):
    return a + b

def main():
    N = 32000000 # Number of elements per array

    A = np.ones(N, dtype=np.float32)
    B = np.ones(N, dtype=np.float32)
    C = np.zeros(N, dtype=np.float32)

    start = timer()
    C = VectorAdd(A,B)
    vectoradd_time = timer() - start

    print("C[:5] = "+str(C[:5]))
    print("C[-5:] = "+str(C[-5:]))

    print("VectorAdd took %f seconds" % vectoradd_time)


if __name__ == '__main__':
    main()

but I am getting the following traceback error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\Charlie\Anaconda\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 540, in runfile
    execfile(filename, namespace)
  File "C:/Users/Charlie/Desktop/development/gcc_vs_pythonGPU/GPU_python.py", line 27, in <module>
    main()
  File "C:/Users/Charlie/Desktop/development/gcc_vs_pythonGPU/GPU_python.py", line 17, in main
    C = VectorAdd(A,B)
  File "C:\aroot\stage\Lib\site-packages\numbapro\cudavec\dispatch.py", line 36, in __call__
  File "C:\aroot\stage\Lib\site-packages\numbapro\common\deviceufunc.py", line 207, in call
  File "C:\aroot\stage\Lib\site-packages\numbapro\cudavec\dispatch.py", line 207, in launch
  File "C:\aroot\stage\Lib\site-packages\numbapro\cudapy\plugins.py", line 95, in __call__
  File "C:\Users\Charlie\Anaconda\lib\site-packages\numba\cuda\compiler.py", line 228, in __call__
    sharedmem=self.sharedmem)
  File "C:\Users\Charlie\Anaconda\lib\site-packages\numba\cuda\compiler.py", line 268, in _kernel_call
    cu_func(*args)
  File "C:\Users\Charlie\Anaconda\lib\site-packages\numba\cuda\cudadrv\driver.py", line 1044, in __call__
    self.sharedmem, streamhandle, args)
  File "C:\Users\Charlie\Anaconda\lib\site-packages\numba\cuda\cudadrv\driver.py", line 1088, in launch_kernel
    None)
  File "C:\Users\Charlie\Anaconda\lib\site-packages\numba\cuda\cudadrv\driver.py", line 215, in safe_cuda_api_call
    self._check_error(fname, retcode)
  File "C:\Users\Charlie\Anaconda\lib\site-packages\numba\cuda\cudadrv\driver.py", line 245, in _check_error
    raise CudaAPIError(retcode, msg)
numba.cuda.cudadrv.driver.CudaAPIError: Call to cuLaunchKernel results in CUDA_ERROR_INVALID_VALUE

Any ideas as to what is going wrong? I tried running the "basic example" on this page: http://docs.continuum.io/numbapro/CUDAufunc.html and it worked fine, but the first one is causing problems and I don't know why. If it makes any difference, I didn't install visual studio, as suggested, but I don't think that this is the issue. Any help is greatly appreciated..

What about dtype and import of cuda? It may seem like voodoo programming, but I would try to keep that as close to the example as possible to narrow potential issues. Also are you sure you only need the float32 and not the float64 data type as well? — Peter Smith, Dec 29 '14 at 22:19
The first link I gave shows a video of the program I'm trying to run (which is not working). The second link I gave, works for me. I have code written from a snapshot at time 6:31 in the video, which is just before the code is executed. I don't see cuda imported there, and I only see float32.. Sorry I didn't print the whole link out, that was a bit confusing. — Charles, Dec 29 '14 at 22:24
What happens if you reduce N to a number less than 65536, say ` N = 32000 # Number of elements per array` ? What GPU are you running on ? — Robert Crovella, Dec 29 '14 at 22:38
WOW. Reducing N did fix the problem. I'm running on an NVIDIA GeForce GT520. Do you know why this happens?.. (I'm very new to cuda/GPUs) — Charles, Dec 29 '14 at 22:41
I'm not a numbapro expert. I suspect it has something to do with the limitations of your GPU, either in terms of compute capability(2.1), or memory size. Does your GPU have 1GB of memory? Is it hosting a windows display? Try adding the following line near the beginning of `main`: `VectorAdd.max_blocksize = 512` and then change `N` back to `32000000` and re-try. Also try `VectorAdd.max_blocksize = 1024` — Robert Crovella, Dec 29 '14 at 23:16
Hmm, I'm thinking you're right. the max_blocksize did not permit the execution. This kind of throws a wrench in the works. Thank you for your help! — Charles, Dec 29 '14 at 23:38

Error invalid value when using CUDA

0 Answers0