The invalid memory access error does not actually refer to the cudaMemcpyAsync
operation. So studying that alone will be unlikely to yield anything useful.
CUDA uses an asynchronous reporting mechanism to report device code execution errors "at the next opportunity" via the host API. So the error you are seeing could refer to any kernel execution that took place prior to that call.
To help localize the error, you can try specifying launch blocking when you run your code. The usefulness of this will probably depend on exactly how the code is written, and whether any sort of error checking is being done after CUDA kernel launches. If you compile your code with --lineinfo
, or even if you don't, you can get additional localization information about the problem using the method indicated here.
The observation in the comment is a good one, and is perhaps an important clue to coding defects. I will note that:
- albeit curious, as posted, the transfer size is consistent with the allocation size, so the operation itself is unlikely to be throwing an error for that reason
- based on my experience with CUDA error reporting (i.e. familiarity with error codes and their text translations) the "invalid memory access" error is attributable to a device code execution error. If the CUDA runtime can determine that a given transfer size is inconsistent with an allocation size, the error given will be "invalid argument".
You can take a look at section 12 in this online training series to get a more in-depth treatment of CUDA error reporting, as well as debugging suggestions.