I need to transform an existing Code about SPH (=Smoothed Particle Hydrodynamics) into a code that can be run on a GPU.
Unfortunately, it has a lot of data structure that I need to copy from the CPU to the GPU. I already looked up in the web and I thought, that I did the right thing for my copying-code, but unfortunately, I get an error (something with unhandled exception).
When I opened the Debugger, I saw that there is no information passed to my variables that should be copied to the GPU. It's just saying "The memory could not be read".
So here is an example of one data structure that needs to be copied to the GPU:
__device__ struct d_particle_data
{
float Pos[3]; /*!< particle position at its current time */
float PosMap[3]; /*!< initial boundary particle postions */
float Mass; /*!< particle mass */
float Vel[3]; /*!< particle velocity at its current time */
float GravAccel[3]; /*!< particle acceleration due to gravity */
}*d_P;
and I pass it on the GPU with the following:
cudaMalloc((void**)&d_P, N*sizeof(sph_particle_data));
cudaMemcpy(d_P, P, N*sizeof(d_sph_particle_data), cudaMemcpyHostToDevice);
The data structure P looks the same as the data structure d_P. Does anybody can help me?
EDIT
So, here's a pretty small part of that code:
First, the headers I have to use in the code:
Allvars.h: Variables that I need on the host
struct particle_data { float a; float b; } *P;
proto.h: Header with all the functions
extern void main_GPU(int N, int Ntask);
Allvars_gpu.h: all the variables that have to be on the GPU
__device__ struct d_particle_data { float a; float b; } *d_P;
So, now I call from the .cpp-File the -.cu-File: hydra.cpp:
#include <stdio.h>
#include <cuda_runtime.h>
extern "C" {
#include "proto.h"
}
int main(void) {
int N_gas = 100; // Number of particles
int NTask = 1; // Number of CPUs (Code has MPI-stuff included)
main_GPU(N_gas,NTask);
return 0;
}
Now, the action takes place in the .cu-File: hydro_gpu.cu:
#include <cuda_runtime.h>
#include <stdio.h>
extern "C" {
#include "Allvars_gpu.h"
#include "allvars.h"
#include "proto.h"
}
__device__ void hydro_evaluate(int target, int mode, struct d_particle_data *P) {
int c = 5;
float a,b;
a = P[target].a;
b = P[target].b;
P[target].a = a+c;
P[target].b = b+c;
}
__global__ void hydro_particle(struct d_particle_data *P) {
int i = threadIdx.x + blockIdx.x*blockDim.x;
hydro_evaluate(i,0,P);
}
void main_GPU(int N, int Ntask) {
int Blocks;
cudaMalloc((void**)&d_P, N*sizeof(d_particle_data));
cudaMemcpy(d_P, P, N*sizeof(d_particle_data), cudaMemcpyHostToDevice);
Blocks = (N+N-1)/N;
hydro_particle<<<Blocks,N>>>(d_P);
cudaMemcpy(P, d_P, N*sizeof(d_particle_data), cudaMemcpyDeviceToHost);
cudaFree(d_P);
}