According to CUDA Programming Guide, "Atomic functions are only atomic with respect to other operations performed by threads of a particular set ... Block-wide atomics: atomic for all CUDA threads in the current program executing in the same thread block as the current thread. These are suffixed with _block, e.g., atomicAdd_block
"
However, I cannot use atomicAdd_block
while my code is compiled fine with atomicAdd
. Is there any header or library that I should add or link to?