How to determine whether cudaErrorIllegalAddress is due to the destination or source address?

Question

The documentation for cudaErrorIllegalAddress says:

The device encountered a load or store instruction on an invalid memory address. This leaves the process in an inconsistent state and any further CUDA work will return the same error. To continue using CUDA, the process must be terminated and relaunched.

In my C++ code, I called:

cudaMemcpy( dst, src, size, kind );

How do I determine whether it is the dst or src that is invalid memory?

Running cuda-memcheck reported no errors and valgrind reports no errors. Given that this crash only happens when built with -g -G what steps might I take to debug further? Is there a way in the device code to check if a cudaErrorIllegalAddress has occurred so I can divide and conquer where in the device code it fails?

This error doesn't have anything to do with the line of code you have excerpted other than being reported there. CUDA error reporting for device activity is asynchronous. Run your code with `cuda-memcheck` and you should get more clarity, see [here](https://stackoverflow.com/questions/27277365/unspecified-launch-failure-on-memcpy/27278218#27278218). Read the first line of what you have excerpted from the documentation, carefully. It is referring to the execution of device code. The `cudaMemcpy` is a host library call, and does not pertain to device code execution. — Robert Crovella, Mar 28 '19 at 19:19
@RobertCrovella Does the fact that this only happens when built `-g -G` suggest anything useful to you? (I'm running cuda-memcheck now.) — WilliamKF, Mar 28 '19 at 19:25
Since `-G` affects the generation of device code, its possible that device code compiled with or without it can act quite differently. The compiler does much more aggressive dead code removal (that code which has no impact on global or observable state) without `-G`, so if some of the "dead code" is making out-of-bounds accesses, that could be a possible explanation for this observation. But nothing "useful" for debug purposes comes out of it, to my knowledge. You may, for example, discover that even your non `-G` code is making illegal accesses when you run it under `cuda-memcheck`, or not — Robert Crovella, Mar 28 '19 at 19:28
@RobertCrovella Unfortunately, `cuda-memcheck` reported no errors, please see updated question. — WilliamKF, Mar 28 '19 at 19:30
you ran `cuda-memcheck` on the code that was compiled with `-G` and reports errors when run by itself? — Robert Crovella, Mar 28 '19 at 19:34
@RobertCrovella Yes, `cuda-memcheck /my/path/AppBuiltWith-G args` reports no errors, just fails the same way with cuda reporting `cudaErrorIllegalAddress` error from device. — WilliamKF, Mar 28 '19 at 19:38
Looks like running under `cuda-gdb` is helpful because it got `CUDA Exception: Warp Illegal Address` and stopped execution in the device. — WilliamKF, Mar 28 '19 at 19:44

score 4 · Answer 1 · answered Mar 28 '19 at 21:14

The answer is neither. As per the documentation, cudaMemcpy will only return one of three status codes itself:

cudaSuccess, cudaErrorInvalidValue, cudaErrorInvalidMemcpyDirection

However, the documentation also notes the following:

Note that this function may also return error codes from previous, asynchronous launches.

And therein lies the likely source of the error, i.e. a prior kernel is producing a runtime error which is then being reported by the blocking API call.

How to determine whether cudaErrorIllegalAddress is due to the destination or source address?

1 Answers1