2

I want to copy the array of structure from host to device in different ways.I can cable to copy full structure form host to device but unable to copy individual element of structure from host to device while one of the element is pointer variable. I am getting seg fault while doing this. The problem occurs for pointer variable but not normal variable.

I debugged and identified that the error lies on below line.

cudaMemcpy(d_s[i].data,h_s[i].data,sizeof(float*),cudaMemcpyHostToDevice);

I unable to resolve this issue.

#include<iostream>

using namespace std;

struct structure
{
   int count;
   float *data;
};


structure * fillStructure(int n,  float *tdata )
{

   structure *h_s;
   h_s = (structure *) malloc( n * sizeof(structure));

   for(int i =0; i< n; i++)
   {
           h_s[i].count =i;
           h_s[i].data = &tdata[i];
   }
   cout<<"Input:\n";
   for(int i=0; i<n ;i++)
   {
       cout<<h_s[i].count<<"\t";
   }
   cout<<endl;
   for(int i=0; i<n ;i++)
   {
       cout<<*(h_s[i].data)<<"\t";
   }
   cout<<endl;
   structure *d_s;

   cudaMalloc((void**)&d_s, n * sizeof(structure));
   for(int i=0; i<n ;i++)
   {
       cudaMemcpy(&d_s[i].count,&h_s[i].count, sizeof(int), cudaMemcpyHostToDevice);
       cudaMemcpy(d_s[i].data,h_s[i].data,sizeof(float *),cudaMemcpyHostToDevice);
   }
                                                                                                                             1,1           Top
 return d_s;

}

int main()
{
   int N =5;
   float *ldata;
   ldata = (float*) malloc(N * sizeof(float));
   for(int i=0 ; i< N ; i++)
   {
    ldata[i] =i*i;
   }
        structure *ps = fillStructure(N, ldata);

        structure *ls;
        ls =(structure *) malloc( N  * sizeof(structure));
        cudaMemcpy(ls,ps,N * sizeof(structure),cudaMemcpyDeviceToHost);

   cout<<"Result:\n";
   for(int i=0; i< N;i++)
   {
       cout<<ls[i].count<<"\t";
   }
   cout<<endl;
   for(int i =0 ; i< N; i++)
   {
     cout<<*(ls[i].data)<<"\t";
   }
   cout<<endl;

}

The expected output is

Input:
0   1   2   3   4   
0   1   4   9   16  
Result:
Input:
0   1   2   3   4   
0   1   4   9   16  

But the actual output is

Input:
0   1   2   3   4   
0   1   4   9   16  
Segmentation fault (core dumped)

Thanks in advance

talonmies
  • 70,661
  • 34
  • 192
  • 269
  • You're just confusing yourself by over-complicating what you're trying to do. `cudaMemcpy()` copies bytes. Think about where in host memory your data is, and where you want it to go on the device - and copy between those two places. – einpoklum Oct 15 '19 at 19:45

1 Answers1

2

regarding this line:

cudaMemcpy(d_s[i].data,h_s[i].data,sizeof(float *),cudaMemcpyHostToDevice);

what you have asked cudaMemcpy to do is:

  1. retrieve the source pointer from h_s[i].data
  2. using that pointer, retrieve a float * quantity from the location that that pointer points to
  3. retrieve the destination pointer from d_s[i].data
  4. using that pointer, store the quantity retrieved in step 2, to whatever location the pointer from step 3 points to.

There are a variety of problems with this, but the most basic issue is that is not really what you want to do.

The thing you are trying to copy is the pointer value contained in (source location) h_s[i].data, and you want to store it in (destination) d_s[i].data. In order to make that work, you must pass a pointer to those locations (source and destination).

You can fix that by adding an ampersand to both items:

cudaMemcpy(&(d_s[i].data),&(h_s[i].data),sizeof(float *),cudaMemcpyHostToDevice);

That should fix the seg fault. And it will correctly copy the numerical value of the pointer contained in h_s[i].data to d_s[i].data.

However, as I've stated previously to you, the pointer you are copying from the source location is set here:

       h_s[i].data = &tdata[i];

and that is a pointer to a location in host memory. Such a pointer cannot be safely used in CUDA device code, so there really isn't much sense in working hard to copy that pointer to the device correctly. It will be useless in device code.

You still haven't grasped the necessity of a CUDA deep copy to make this scheme work. As stated previously, that is covered in a step-by-step fashion here.

Robert Crovella
  • 143,785
  • 11
  • 213
  • 257