2

So I've got a problem that has got me stuck for a little while now. I'm using NSight Eclipse Edition (CUDA 7.0) for programming on a GT 630 (Kepler version) GPU.

Basically, I have an array of a class (Static_Box), and I modify the data on the host (CPU). I then want to send the data over to the GPU to do computation, however, my code is not doing that. Here's some of my code:

#define SIZE_OF_BOX_ARRAY 3

class Edge {
    int x1, y1, x2, y2;
}

class Static_Box {
    Static_Box(int x, int y, int width, int height);
    Edge e1, e2, e3, e4;
}

Static_Box::Static_Box(int x, int y, int width, int height) {
    e1.x1 = x;
    e1.y1 = y;
    e1.x2 = x+width;
    e1.y2 = y;
    // e2.x1 = x+width;  Continuing in this manner (no other calculations)
}

// Storage of the scene. d_* indicates GPU memory
// Static_Box is a class I have defined in another file, it contains a
// few other classes that I wrote as well.
Static_Box *static_boxes;
Static_Box *d_static_boxes;

int main(int argc, char **argv) {
    // Create the host data storage
    static_boxes = (Static_Box*)malloc(SIZE_OF_BOX_ARRAY*sizeof(Static_Box));

    // I then set a few of the indexes of static_boxes here, which is
    // the data I need written while on the CPU.
    // Example:
    static_boxes[0] = Static_Box(

    // Allocate the memory on the GPU
    // CUDA_CHECK_RETURN is from NVIDIA's bit reverse example (exits the application if the GPU fails)
    CUDA_CHECK_RETURN(cudaMalloc((void**)&d_static_boxes, SIZE_OF_BOX_ARRAY * sizeof(Static_Box)));

    int j = 0;
    for (; j < SIZE_OF_BOX_ARRAY; j++) {
    //  Removed this do per Mai Longdong's suggestion
    //    CUDA_CHECK_RETURN(cudaMalloc((void**)&(static_boxes[j]), sizeof(Static_Box)));
        CUDA_CHECK_RETURN(cudaMemcpy(&(d_static_boxes[j]), &(static_boxes[j]), sizeof(Static_Box), cudaMemcpyHostToDevice));
    }
}

I've hunted around on here for quite a while, and found some helpful information from Robert Crovella, and progressed a little bit using his tips, but the answers he gave did not quite pertain to my problem. Does anybody have a solution to keep the host data intact while transferring to the GPU?

Thanks very much for your help!

Edit, included change on first cudaMalloc from MaiLongdong

Edit 2, included second change from Mai Longdong, and provided complete example.

Gliderman
  • 1,195
  • 9
  • 18
  • Don't use `malloc` in C++. Use `new` if you really require dynamic allocation, but in this example you don't, use `std::array`. Also your `cudaMalloc` allocates `sizeof(static_boxes)` bytes which is the size of **a pointer**, which is not what you want. And lastly the second `cudaMalloc` stores its result in `static_boxes`, not `d_static_boxes`. – user703016 Aug 05 '15 at 01:04
  • Okay, getting there. Thanks for pointing out that `sizeof(static_boxes)` I've swapped it over to `SIZE_OF_BOX_ARRAY * sizeof(Static_Box)` I just tried changing the second `cudaMalloc` to use `d_static_boxes` but it is giving me a SIGBUS:Bus error. I'm going to work on copying the data back from the GPU now, and see how that goes. Thanks for your input @MaiLongdong! – Gliderman Aug 05 '15 at 01:18
  • That's a thinko, you can't `cudaMalloc` into a device pointer, I don't even know why I said that, it's not even monday morning. Drop the second `cudaMalloc` altogether. Also, maybe you should get a book on C++ because you seem quite confused with basic semantics. – user703016 Aug 05 '15 at 01:24
  • Unless `Static_Box` contains pointers (which definition you haven't shown) you are done after the first `cudaMalloc`. Writing a question where the extent of your *actual problem description* is "I'm having trouble doing that" is quite unclear, especially when coupled with the fact that you haven't provided an MCVE, which SO [expects for questions like this.](http://stackoverflow.com/help/on-topic) (I've voted to close this question for lack of MCVE.) If `Static_Box` does contain pointers, then the code is quite a bit more complicated. Try [this](http://stackoverflow.com/questions/15431365) – Robert Crovella Aug 05 '15 at 01:36
  • @MaiLongdong Sorry about that, I was basing the second `cudaMalloc` off of [link](http://stackoverflow.com/questions/14284964/cuda-how-to-allocate-memory-for-data-member-of-a-class) but I see now where that was creating an array for the class itself. I do have a book on C++, I guess I haven't gotten to the part that I have here yet. I am also thinking about picking up a CUDA book as well, but it's not something bookstores really carry :) Thanks very much for your help again, I applied similar code to copy the data back to the host, and it works! – Gliderman Aug 05 '15 at 01:44
  • @RobertCrovella I was writing the last comment when you posted yours, sorry. `Static_Box` does not contain pointers, I had forgotten to mention that. I'll edit to make it a full example, give me a moment... – Gliderman Aug 05 '15 at 01:53
  • 2
    Putting "Solved" in the question title is not appropriate on SO. Instead, upvote or mark one of the answers as accepted, or else provide your own answer and accept that. That is the SO way to mark a question "Solved". By the way I removed my close vote as you've now provided something that approximates an MCVE (although it still has uncompilable junk in it.) – Robert Crovella Aug 05 '15 at 02:10

1 Answers1

1

If Static_Box contains no pointers (member data referred to by pointers that would require independent allocations), then copying an array of them is not really any different than copying an array of POD types, like int. This should be all you need:

#define SIZE_OF_BOX_ARRAY 3

Static_Box *static_boxes;
Static_Box *d_static_boxes;

int main(int argc, char **argv) {

    static_boxes = (Static_Box*)malloc(SIZE_OF_BOX_ARRAY*sizeof(Static_Box));
    CUDA_CHECK_RETURN(cudaMalloc((void**)&d_static_boxes, SIZE_OF_BOX_ARRAY * sizeof(Static_Box)));
    CUDA_CHECK_RETURN(cudaMemcpy(d_static_boxes, static_boxes, SIZE_OF_BOX_ARRAY*sizeof(Static_Box), cudaMemcpyHostToDevice));

If you think that is not working, you'll need to give a specific example of what you are doing and what exactly led you to believe that it is not working (data not matching, CUDA runtime error thrown, etc.) The example you provide should be complete, so that someone else can compile it, run it, and see whatever problem it is that you are reporting. If the code you post in your question doesn't compile, it's not an MCVE (my opinion, which influences my voting pattern.)

Community
  • 1
  • 1
Robert Crovella
  • 143,785
  • 11
  • 213
  • 257
  • Wow, looks like it all traces back to me thinking that the pointer was the actual size of the array. Switching it back to the old way of copying (not using the `for` loop works as you describe. Thanks for all your help! I'm going to mark this as the accepted answer. – Gliderman Aug 05 '15 at 02:13