0

I am trying to implement this fairly straightforward function as a CUDA kernel using Numba:

@nb.njit(parallel=True)
def sum_pixel_signals_cpu(pixels_signals, signals, index_map):
    
    for it in nb.prange(signals.shape[0]):
        for ipix in nb.prange(signals.shape[1]):
            index = index_map[it][ipix]
            start_tick = track_starts[it] // consts.t_sampling
            for itick in nb.prange(signals.shape[2]):
                itime = int(start_tick+itick)
                pixels_signals[index, itime] += signals[it][ipix][itick]

This function works fine and the result is what I expect. I tried to implement it a CUDA-equivalent version with this piece of code:

@cuda.jit
def sum_pixel_signals(pixels_signals, signals, index_map):
    it, ipix, itick = cuda.grid(3)
    if it < signals.shape[0] and ipix < signals.shape[1]:
        index = index_map[it][ipix]
        start_tick = track_starts[it] // consts.t_sampling
        if itick < signals.shape[2]:
            itime = int(start_tick+itick)
            cuda.atomic.add(pixels_signals, (index, itime), signals[it][ipix][itick])

Unfortunately, when I call the kernel, I get this not very helpful error message:

ERROR:numba.cuda.cudadrv.driver:Call to cuMemcpyDtoH results in UNKNOWN_CUDA_ERROR
---------------------------------------------------------------------------
CudaAPIError                              Traceback (most recent call last)
<ipython-input-14-3786491325e7> in <module>
----> 1 sum_pixel_signals[threadsperblock,blockspergrid](pixels_signals, d_signals, pixel_index_map)

I don't understand what I am doing wrong. Is there a way to at least debug this kind of error messages?

talonmies
  • 70,661
  • 34
  • 192
  • 269
robsol90
  • 113
  • 9
  • 3
    1. I believe questions like this are expected to have a [mcve] 2. You should make sure you are not running into a WDDM TDR issue. 3. You seem to have your grid config arguments [backwards](https://numba.pydata.org/numba-doc/latest/cuda/kernels.html#kernel-invocation). 4. Issues like this are usually either a kernel timeout or a kernel illegal memory access. You can use a method like [this](https://stackoverflow.com/questions/27277365) to get some additional indication if there is an out-of-bounds access error, but source-level debug will be difficult due to the numba translation mechanism – Robert Crovella Sep 14 '20 at 22:19
  • You are absolutely right, the problem was the config arguments backwards. Unfortunately the error message wasn't very helpful... – robsol90 Sep 14 '20 at 23:08

1 Answers1

2

The problem was caused by the grid arguments written backwards... The correct way:

sum_pixel_signals[blockspergrid,threadsperblock](pixels_signals, d_signals, pixel_index_map)
robsol90
  • 113
  • 9