I am trying to implement this fairly straightforward function as a CUDA kernel using Numba:
@nb.njit(parallel=True)
def sum_pixel_signals_cpu(pixels_signals, signals, index_map):
for it in nb.prange(signals.shape[0]):
for ipix in nb.prange(signals.shape[1]):
index = index_map[it][ipix]
start_tick = track_starts[it] // consts.t_sampling
for itick in nb.prange(signals.shape[2]):
itime = int(start_tick+itick)
pixels_signals[index, itime] += signals[it][ipix][itick]
This function works fine and the result is what I expect. I tried to implement it a CUDA-equivalent version with this piece of code:
@cuda.jit
def sum_pixel_signals(pixels_signals, signals, index_map):
it, ipix, itick = cuda.grid(3)
if it < signals.shape[0] and ipix < signals.shape[1]:
index = index_map[it][ipix]
start_tick = track_starts[it] // consts.t_sampling
if itick < signals.shape[2]:
itime = int(start_tick+itick)
cuda.atomic.add(pixels_signals, (index, itime), signals[it][ipix][itick])
Unfortunately, when I call the kernel, I get this not very helpful error message:
ERROR:numba.cuda.cudadrv.driver:Call to cuMemcpyDtoH results in UNKNOWN_CUDA_ERROR
---------------------------------------------------------------------------
CudaAPIError Traceback (most recent call last)
<ipython-input-14-3786491325e7> in <module>
----> 1 sum_pixel_signals[threadsperblock,blockspergrid](pixels_signals, d_signals, pixel_index_map)
I don't understand what I am doing wrong. Is there a way to at least debug this kind of error messages?