Copy huge structure of arrays to GPU

Question

I need to transform an existing Code about SPH (=Smoothed Particle Hydrodynamics) into a code that can be run on a GPU.

Unfortunately, it has a lot of data structure that I need to copy from the CPU to the GPU. I already looked up in the web and I thought, that I did the right thing for my copying-code, but unfortunately, I get an error (something with unhandled exception).

When I opened the Debugger, I saw that there is no information passed to my variables that should be copied to the GPU. It's just saying "The memory could not be read".

So here is an example of one data structure that needs to be copied to the GPU:

__device__ struct d_particle_data
{
  float Pos[3];         /*!< particle position at its current time */
  float PosMap[3];      /*!< initial boundary particle postions */
  float Mass;           /*!< particle mass */
  float Vel[3];         /*!< particle velocity at its current time */
  float GravAccel[3];       /*!< particle acceleration due to gravity */
}*d_P;

and I pass it on the GPU with the following:

cudaMalloc((void**)&d_P, N*sizeof(sph_particle_data)); cudaMemcpy(d_P, P, N*sizeof(d_sph_particle_data), cudaMemcpyHostToDevice);

The data structure P looks the same as the data structure d_P. Does anybody can help me?

EDIT

So, here's a pretty small part of that code:

First, the headers I have to use in the code:

Allvars.h: Variables that I need on the host

struct particle_data { float a; float b; } *P;

proto.h: Header with all the functions

extern void main_GPU(int N, int Ntask);

Allvars_gpu.h: all the variables that have to be on the GPU

__device__ struct d_particle_data { float a; float b; } *d_P;

So, now I call from the .cpp-File the -.cu-File: hydra.cpp:

#include <stdio.h>
#include <cuda_runtime.h>


extern "C" {
#include "proto.h"
}

int main(void) {
int N_gas = 100; // Number of particles
int NTask = 1; // Number of CPUs (Code has MPI-stuff included)
main_GPU(N_gas,NTask);
return 0;
}

Now, the action takes place in the .cu-File: hydro_gpu.cu:

#include <cuda_runtime.h>
#include <stdio.h>

extern "C" {
#include "Allvars_gpu.h"
#include "allvars.h"
#include "proto.h"
}

__device__ void hydro_evaluate(int target, int mode, struct d_particle_data *P) {
int c = 5;
float a,b;
a = P[target].a;
b = P[target].b;
P[target].a = a+c;
P[target].b = b+c;
}


__global__ void hydro_particle(struct d_particle_data *P) {
int i = threadIdx.x + blockIdx.x*blockDim.x;
hydro_evaluate(i,0,P);
}


void main_GPU(int N, int Ntask) {
int Blocks;
cudaMalloc((void**)&d_P, N*sizeof(d_particle_data));
cudaMemcpy(d_P, P, N*sizeof(d_particle_data), cudaMemcpyHostToDevice);
Blocks = (N+N-1)/N;

hydro_particle<<<Blocks,N>>>(d_P);

cudaMemcpy(P, d_P, N*sizeof(d_particle_data), cudaMemcpyDeviceToHost);
cudaFree(d_P);
}

What are `sph_particle_data` and `d_sph_particle_data`? You must provide a short, complete example someone else could compile and analyse if you want help here. — talonmies, Feb 05 '16 at 13:24
In which exact call do you get the error? I suppose you check the return value of each of the CUDA methods, so you can exactly tell which call failed, right? — EmDroid, Feb 05 '16 at 13:41
we don't use `cudaMalloc` or `cudaMemcpy` with `__device__` tagged variables. Any time you are having trouble with a cuda code, you should use [proper cuda error checking](http://stackoverflow.com/questions/14038589/what-is-the-canonical-way-to-check-for-errors-using-the-cuda-runtime-api) and SO [expects](http://stackoverflow.com/help/on-topic) an [MCVE](http://stackoverflow.com/help/mcve) for questions like this (why isn't my code working?) — Robert Crovella, Feb 05 '16 at 14:00
I added a little example ... sorry, that it took me so long time, but was a little bit difficult to make it small ... :) — EllaPropella, Feb 08 '16 at 17:19
What was requested was an [MCVE](http://stackoverflow.com/help/mcve) <--click here and read. A code that someone else could compile. What you've provided can't be compiled. — Robert Crovella, Feb 09 '16 at 04:31
oops, sorry! Was late yesterday. I edited it again, its compiling now ... — EllaPropella, Feb 09 '16 at 13:00
When I compile the code you have here, I get 4 warnings of the form: "t1070.cu(37): warning: a __device__ variable "d_P" cannot be directly read in a host function" Do you get any warnings like that? You should not ignore such warnings. You can fix it by removing the `__device__` tag from the `d_P` definition as @talonmies indicated. Furthermore, you haven't provided any allocation for the variable `P` in your host code. You can't `cudaMemcpy` from `P` to `d_P` when `P` is an unallocated pointer. After you fix those items, run your code with `cuda-memcheck`. — Robert Crovella, Feb 10 '16 at 03:39

score 1 · Answer 1 · answered Feb 10 '16 at 01:59

1

The really short answer is probably not to declare *d_P as a static __device__ symbol. Those cannot be passed as device pointer arguments to cudaMalloc, cudaMemcpy, or kernel launches and your use of __device__ is both unecessary and incorrect in this example.

If you make that change, your code might start working. Note that I lost interest in trying to actually compile your MCVE code some time ago, and there might well be other problems, but I'm too bored with this question to look for them. This answer has mostly been added to get this question off the unanswered queue for the CUDA tag.

answered Feb 10 '16 at 01:59

talonmies

70,661
34
192
269

Haha, ok you're advice is not really helping me, it doesn't change anything ;). I think, my problem goes a little bit deeper, but I will try to figure it out by myself (with the hints of Robert Crovella) as I don't want to bore you to tears. Sorry for waisting your time, my dear. – EllaPropella Feb 10 '16 at 12:41
@EllaPropella: As I said, there are other mistakes in your code. For example, nowhere in what you have posted do you show any allocation or initialisation of `*P`. That should probably cause a segfault or CUDA runtime error. But despite having asked this 5 days ago and edited it 3 times, you still haven't managed to provide code someone else could actually compile and run, nor adequately described what the exact problem is. I don't understand how you expect a more concise or useful answer than that which I have provided. – talonmies Feb 10 '16 at 15:01
hm, ok. It's compiling at my computer ... Now I am confused. Anyway, I put the __device__ in front of the structure, bcause I wanted to be able to call the struct, that holds all the properties of the particles, from the device-function, where the computation of the hydrodynamics takes place. And then, after the computation, copying it back to the CPU, where other stuff is hapening then. sorry for the confusion, but the code I have to edit is really long and nested as hell. So it's really not easy to find a simple example ... – EllaPropella Feb 10 '16 at 16:24

Copy huge structure of arrays to GPU

1 Answers1