cudaMemcpyDeviceToHost() fails

Question

I have the following code (assume everything is defined properly):

#include "OurIncludes.h"
#include <ctime>

__global__ void kernel_testing(int *d_intersects, Circle *part1, Circle *part2)
{
    int i = blockIdx.x * blockDim.x + threadIdx.x;
    int j = blockIdx.y * blockDim.y + threadIdx.y;
    if (i < 10 && j < 10) {
        int index = i + j * 10;
        d_intersects[index] = part1[i].intersect(part2[j]);
    }
}


int main(void)
{
    dim3 GRID(1, 1);
    dim3 BLOCK(10, 10);

    short randomNum;
    RandObj randGenerator;
    Circle* obj = new Circle[10];
    Circle* obj2 = new Circle[10];
    Circle *d_obj;
    Circle *d_obj2;
    int intersects[100];
    int *d_intersects;

    if (cudaSuccess != cudaMalloc((void **)&d_obj, sizeof(Circle) * 10)) {
        fprintf(stderr, "Failed to allocate memory for d_result\n");
    }
    if (cudaSuccess != cudaMalloc((void **)&d_obj2, sizeof(Circle) * 10)) {
        fprintf(stderr, "Failed to allocate memory for d_result\n");
    }
    if (cudaSuccess != cudaMalloc((void **)&d_intersects, sizeof(int) * 100)) {
        fprintf(stderr, "Failed to allocate memory for d_result\n");
    }
    for (int i = 0; i < 10; i++) {
        obj[i] = (*randGenerator.makeRandomCircle());
    }

    for (int i = 0; i < 10; i++) {
        obj2[i] = (*randGenerator.makeRandomCircle());
    }
    size_t size = sizeof(Circle);
    if (cudaSuccess != cudaMemcpy(d_obj, obj, size * 10, cudaMemcpyHostToDevice)) {
        fprintf(stderr, "Failed to copy data to d_obj\n");
    }
    if (cudaSuccess != cudaMemcpy(d_obj2, obj2, size * 10, cudaMemcpyHostToDevice)) {
        fprintf(stderr, "Failed to copy data to d_obj2\n");
    }

    kernel_testing << < GRID, BLOCK >> >(d_intersects, d_obj, d_obj2);


    cudaError_t s = cudaMemcpy(intersects, d_intersects, sizeof(int) * 100, cudaMemcpyDeviceToHost);
    fprintf(stderr, "Error is: %s", cudaGetErrorString(s));
    cudaFree(d_intersects);
    cudaFree(d_obj);
    cudaFree(d_obj2);
    return 0;   
}

For some reason, the code always fails at cudaMemcpyDeviceToHost, and I cannot see a reason as to why it should. I've tried launching with different objects (triangles, spheres etc.), but it always fails when I need to copy data back from device to host. Any help and/or suggestion is appreciated, I'm very new to programming using CUDA. Thanks.

EDIT: The error code says that an illegal memory access was encountered, but I don't see why that should happen.

EDIT 2: So I've removed all the double pointers and "flattened" my arrays, yet I still have the same problem. I'm completely out of ideas now.

Is there an error message or error code that tells you what went wrong? Without that it would be hard for any of us to tell you what went wrong. — RaGe, Apr 30 '15 at 21:54
All your code will tell you is that the return value is not cudaSuccess, the failure can be 1 of 3 errors, see here: http://developer.download.nvidia.com/compute/cuda/4_1/rel/toolkit/docs/online/group__CUDART__MEMORY_g48efa06b81cc031b2aa6fdc2e9930741.html use either a debugger or put in more logging to figure out which error code you're getting. — RaGe, Apr 30 '15 at 21:56
@RaGe can I `cout` the value of `cudaMemcpy` directly to console? Would that give me anything? — Tarun Verma, Apr 30 '15 at 22:00
Its an ENUM, so you'll probably just get a number - which is not a lot of help. No harm in trying though. Why not write 4 if statements to see which error code it matches. — RaGe, Apr 30 '15 at 22:03
I've edited my code to enter the conditions, but I feel like I'm doing something wrong, because none of the `cout`s print anything. I'm very new to this, so I apologize, but I'd love it if you could help me out of this. — Tarun Verma, Apr 30 '15 at 22:11
Provide a [complete code](http://stackoverflow.com/help/mcve). SO [expects that for questions like this](http://stackoverflow.com/help/on-topic). Use [proper cuda error checking](http://stackoverflow.com/questions/14038589/what-is-the-canonical-way-to-check-for-errors-using-the-cuda-runtime-api). You'll be surprised at how it will help guide you, and others trying to help you. — Robert Crovella, Apr 30 '15 at 22:15
I'm sorry, I'll keep that in mind from next time onwards. Error string says: "an illegal memory access was encountered" I can't see why this should happen, though. — Tarun Verma, Apr 30 '15 at 22:18
See now, that is a much better question. check if this helps: http://stackoverflow.com/questions/25702573/simple-cuda-test-always-fails-with-an-illegal-memory-access-was-encountered-er — RaGe, Apr 30 '15 at 22:28
It's happening because you're not passing the double pointers to the kernel correctly. That is particularly challenging in cuda. When those invalid double pointers get dereferenced in the kernel code, you get an illegal memory address encountered. — Robert Crovella, Apr 30 '15 at 22:55
@RobertCrovella So I've removed all the double pointers and "flattened" my arrays, yet I still have the same problem. I'm completely out of ideas now. — Tarun Verma, Apr 30 '15 at 23:12
@RobertCrovella error handling on the kernel side is as non-existent as ever in CUDA? — RaGe, Apr 30 '15 at 23:56
I don't think I can provide any further assistance if you don't want to provide a complete code. @RaGe I have no idea what you mean. The reported error (invalid address encountered) is effectively a seg fault in the kernel. It's certainly possible to get more granularity on this, as described [here](http://stackoverflow.com/questions/27277365/unspecified-launch-failure-on-memcpy/27278218?s=1|0.3178#27278218) (and/or with a debugger), but there is only one line of code in this kernel that could be generating such a "seg fault" (dereferencing an invalid pointer). — Robert Crovella, May 01 '15 at 01:18

score -2 · Answer 1 · answered Apr 30 '15 at 23:28

-2

I think I once encountered such a problem, my solution was:

cudaError_t status = cudaMemcpy(devPtr, srcPtr, size * sizeof(int), cudaMemcpyHostToDevice);
if (status == cudaSuccess) { ... }

Try not comparing fuction to cudaSuccess directly, but via variable.

answered Apr 30 '15 at 23:28

CoreMeltdown

194
1
11

2

The problem is dereferencing an invalid pointer in the kernel. It has nothing to do with `cudaMemcpy` operations. The question title is misleading because the error is reported asynchronously from the kernel and shows up on the next CUDA API call, which happens to be `cudaMemcpy` in this case. – Robert Crovella May 01 '15 at 01:22

cudaMemcpyDeviceToHost() fails

1 Answers1