Transferring an array pointer into CUDA memory via separate class

Question

I have a class named "Coordinate" which consist of an int array pointer and a bool variable. I want to send this pointer into CUDA, modify it and then use it back in CPU memory.

Here is Coordinate.h :

#ifndef __COORDINATE_H
#define __COORDINATE_H

#include <stdlib.h>
#include <cuda.h>

using namespace std;

class Coordinate {

public:
    int *array_pointer;
    bool flag;

    Coordinate() { flag = false; }

    Coordinate(int array_length) {
        flag = false;
        array_pointer = new int[array_length];
        for (int i = 0; i < array_length; i++) {
            array_pointer[i] = -1;
        }
    }
};

#endif

I have made 2 global functions in cudamain.cu Check1 and Check2, both will take a Coordinate as argument. Check1 function will change only boolean flag which Check2 will change boolean flag and also modify the array.

Here is cudamain.cu :

#include <iostream>
#include <cuda.h>
#include "Coordinate.h"

using namespace std;

__global__ void check1(Coordinate *ptr) {
    c->flag = true;
}

__global__ void check2(Coordinate *c) {
    c->flag = true;
    for (int i = 0; i < 10; i++) {
        c->array_pointer[i] = i;
    }
}


int main() {
    Coordinate *d_a, *d_b, a, b;
    a = Coordinate(10); b = Coordinate(10);

    size_t size = sizeof(Coordinate);

    cudaMalloc((void**)&d_a, size); cudaMalloc((void**)&d_b, size);
    cudaMemcpy(d_a, &a, size, cudaMemcpyHostToDevice); cudaMemcpy(d_b, &b, size, cudaMemcpyHostToDevice);

    check1 << <1, 1 >> > (d_a);
    cudaMemcpy(&a, d_a, size, cudaMemcpyDeviceToHost);
    cout <<"d_a result-> " <<a.flag <<" " <<a.array_pointer[9] << endl;

    check2 << <1, 1 >> > (d_b);
    cudaMemcpy(&b, d_b, size, cudaMemcpyDeviceToHost);
    cout << "d_b result-> " << b.flag << " " << b.array_pointer[9] << endl;
    return 0;
}

I made 2 separate coordinate objects a and b, a will go with check1 and b will go with check2. Both a and b are initialized in same way.

The result I get is

d_a result-> 1 -1
d_b result-> 0 -1

Expected result:

d_a result-> 1 -1
d_b result-> 1 9

Different Coordinate objects may have different array length so I can't initialize the array pointer in the coordinate class.

1. use [proper cuda error checking](https://stackoverflow.com/questions/14038589/what-is-the-canonical-way-to-check-for-errors-using-the-cuda-runtime-api). 2. run your code with `cuda-memcheck`. When I do that, I see that the `check2` kernel is making an illegal write access — Robert Crovella, Jul 07 '17 at 17:39
a class with an embedded pointer requires special handling in CUDA. study [this](https://stackoverflow.com/a/15435592/1695960) answer. — Robert Crovella, Jul 07 '17 at 17:42
@RobertCrovella, Is it legal to access host pointer from withing a kernel? — Sullivan Risk, Jul 07 '17 at 22:25
@SullivanRisk I believe that question and others are answered in the other answer I linked. This question is arguably a duplicate of that, insofar as that answer covers exactly this case (a class with an embedded pointer) and explains what to do to manage it correctly. Even apart from that or this example of a class, it is a fundamental CUDA principle that host pointers should not be dereferenced in device code. — Robert Crovella, Jul 08 '17 at 01:13

einpoklum · Answer 1 · 2017-07-08T14:51:40.713

You cannot access host memory from a CUDA kernel by dereferncing, unless that piece of memory was specially-allocated to allow this, e.g. using cudaMallocManaged(). So your program cannot work. Read this Parallel4All post on accessing the same memory both from the host and the device. Another alternative is the one @RobertCrovella linked to, involving allocating device-side memory.

But, frankly, I doubt any of these two options are what you should go for in this case, since a class named Coordinate does not seem to be something which would need a variable-size array of integers. Are you sure something like

template <unsigned NumDimensions> 
class Coordinate<N> { 
    std::array<int, NumDimensions> a;  
    // etc. etc.
}

won't do?

(Note that the std::array class itself cannot really be used in device code, like most of the standard library. But you can easily clone std::array and then use your cuda::array class on both the host and the device side.)

Even if dynamic allocation of memory is required for some reason, it is not a good idea to have a class which, it seems, would be used many times, allocate its own memory. Consider using some pre-allocated buffer and have your Coordinates just advance an offset into it (although this would require synchronization for thread safety, or making the buffer thread-local).

have you actually tried using a `std::array` entity in device code? Because I tried what you are showing, and can't get it to work. — Robert Crovella, Jul 08 '17 at 14:13
@RobertCrovella: Ah, yes, indeed that does not work, and OP would need to pass something else to the device. See edit — einpoklum, Jul 08 '17 at 14:49

Transferring an array pointer into CUDA memory via separate class

1 Answers1