0

I have the following (simplified) code in my .cu file

typedef struct
{
    int leg_id;
    int contract_id;
} CudaLeg;

typedef struct
{
    CudaLeg* legs;
    int no_legs;
} CudaPortfolio;

extern "C"
__global__ void kernel(CudaPortfolio* o_portfolios_cuda, const int no_portfolios)
{
//    fill o_portfolios_cuda with data
}

void cudaFunction(CudaPortfolio* o_portfolios, unsigned long long no_portfolios)
{
    CudaPortfolio* o_portfolios_cuda;
    cudaMalloc((void **)& o_portfolios_cuda, sizeof(CudaPortfolio) * no_portfolios);

    kernel<<<32, 32>>>(o_portfolios_cuda, no_portfolios);

    cudaMemcpy(o_portfolios, o_portfolios_cuda, sizeof(CudaPortfolio) * no_portfolios, cudaMemcpyDeviceToHost);

    //printf below works
    printf("CPU no legs strike output portfolio: %d\n", o_portfolios[0].no_legs);
    //printf below crashes the program
    printf("CPU Leg 1 multiplier output portfolio: %d\n", o_portfolios[0].legs[0].multiplier);

    cudaFree(o_portfolios_cuda);
}

The GPU is a GTX580, sm2.0. The GPU can work fine with o_portfolios_cuda and fill it with data and do calculations with it. The first printf of o_portfolios[0].no_legs gives back the correct function. But when I try to access some of the portfolios legs (o_portfolios[0].legs[0].multiplier) the program crashes. Any ideas how I can fix this? Thank you.

@Robert Crovella I already tried something like that, but it didn't work. I tried it again and added

    CudaLeg* o_portfolios_legs_cuda;
    cudaMalloc((void **)& o_portfolios_legs_cuda, sizeof(CudaLeg));
    cudaMemcpy(o_portfolios_legs_cuda, o_portfolios->legs, sizeof(CudaLeg), cudaMemcpyHostToDevice);
    cudaMemcpy(&(o_portfolios_cuda->legs), &o_portfolios_legs_cuda, sizeof(CudaLeg *), cudaMemcpyHostToDevice);

But now the program crashes on the 3rd line I just added (cudaMemcpy(o_portfolios_legs_cuda, ...)

@MarkoR The CudaLeg objects don't have a fixed number.

Snels Nick
  • 925
  • 3
  • 13
  • 25
  • 5
    this topic has been covered many times, it's called a deep copy, for example take a look [here](http://stackoverflow.com/questions/16024087/copy-an-object-to-device/16024373#16024373) – Robert Crovella Jun 20 '16 at 11:38
  • @RobertCrovella Thank you for the link. I tried it, but now my program crashes on the first cudaMemcpy that the post suggests to add. I modified my question with the added code. Can you tell me what I did wrong? Thank you. – Snels Nick Jun 20 '16 at 12:36

2 Answers2

1

You are allocating the space for CudaPortfolio struct, which has one int and one CudaLeg pointer. But you are not allocating the space for what that CudaLeg points to. So when you try to access it, it crashes.

How to fix it: If you are only going to have 1 CudaLeg, you can drop the pointer and just have CudaLeg leg inside CudaPortfolio. If you are going to have fixed number of CudaLeg objects, you can have for example "CudaLeg[5] legs" inside CudaPortfolio. If you don't have fixed number of CudaLeg objects, and want to keep it as it is right now, you need to do additional malloc for number of legs and to assign it to each portfolio. See the link that Robert Crovella posted in the comment to see how it is done.

MarkoR
  • 543
  • 4
  • 12
0

When you are copying a pointer legs, pointing to some device memory address, to host memory, you need also change the pointer to a host address, which stores the host copy of the original device data.

kangshiyin
  • 9,681
  • 1
  • 17
  • 29