So I've got a problem that has got me stuck for a little while now. I'm using NSight Eclipse Edition (CUDA 7.0) for programming on a GT 630 (Kepler version) GPU.
Basically, I have an array of a class (Static_Box), and I modify the data on the host (CPU). I then want to send the data over to the GPU to do computation, however, my code is not doing that. Here's some of my code:
#define SIZE_OF_BOX_ARRAY 3
class Edge {
int x1, y1, x2, y2;
}
class Static_Box {
Static_Box(int x, int y, int width, int height);
Edge e1, e2, e3, e4;
}
Static_Box::Static_Box(int x, int y, int width, int height) {
e1.x1 = x;
e1.y1 = y;
e1.x2 = x+width;
e1.y2 = y;
// e2.x1 = x+width; Continuing in this manner (no other calculations)
}
// Storage of the scene. d_* indicates GPU memory
// Static_Box is a class I have defined in another file, it contains a
// few other classes that I wrote as well.
Static_Box *static_boxes;
Static_Box *d_static_boxes;
int main(int argc, char **argv) {
// Create the host data storage
static_boxes = (Static_Box*)malloc(SIZE_OF_BOX_ARRAY*sizeof(Static_Box));
// I then set a few of the indexes of static_boxes here, which is
// the data I need written while on the CPU.
// Example:
static_boxes[0] = Static_Box(
// Allocate the memory on the GPU
// CUDA_CHECK_RETURN is from NVIDIA's bit reverse example (exits the application if the GPU fails)
CUDA_CHECK_RETURN(cudaMalloc((void**)&d_static_boxes, SIZE_OF_BOX_ARRAY * sizeof(Static_Box)));
int j = 0;
for (; j < SIZE_OF_BOX_ARRAY; j++) {
// Removed this do per Mai Longdong's suggestion
// CUDA_CHECK_RETURN(cudaMalloc((void**)&(static_boxes[j]), sizeof(Static_Box)));
CUDA_CHECK_RETURN(cudaMemcpy(&(d_static_boxes[j]), &(static_boxes[j]), sizeof(Static_Box), cudaMemcpyHostToDevice));
}
}
I've hunted around on here for quite a while, and found some helpful information from Robert Crovella, and progressed a little bit using his tips, but the answers he gave did not quite pertain to my problem. Does anybody have a solution to keep the host data intact while transferring to the GPU?
Thanks very much for your help!
Edit, included change on first cudaMalloc from MaiLongdong
Edit 2, included second change from Mai Longdong, and provided complete example.