How do I copy a single struct into global memory?

Question

I want to copy a set of initaliation values that every thread uses into __global__ memory. I have summarized them into a single struct. However, there are multiple problems with getting it into __global__ memory. First of all, VS2015 tells me that "dynamic initialization is not supported for a __constant__ variable" for this line: __constant__ initValsStruct d_initVals;

Second of all, it tells me that there is "no suitable conversion function from initValsStruct to const void * in this line: cudaMemcpyToSymbol(d_initVals, &h_initVals, sizeof(initValsStruct));

This might be a quite basic C or CUDA question, but what is the best way to copy a single struct to __global__ memory?

I tried what is down in the code; I found a sample somewhere on the CUDA dev forum, where some __constant__ memory (an int array of 1024 elements) gets initialized in the same way.

typedef struct
{
    unsigned int voxels_x = 0;
    unsigned int voxels_y = 0;
    unsigned int voxels_z;

    //Input and output data amounts
    unsigned int n_lines;
    unsigned int TD_samples;

    //amount of total calculations
    unsigned int n_calc;
} initValsStruct;

initValsStruct h_initVals; //host struct to be copied into __global__ memory
__constant__ initValsStruct d_initVals; //where it has to be copied to

int main(){
    //here I initialize every element of the initValsStruct h_initVals, so it is initialized

    cudaMemcpyToSymbol(d_initVals, &h_initVals, sizeof(initValsStruct));
}

This is how I access it:

typedef struct
{
    int device = 0;
    double  *d_xre, *d_xim, //input device arrays
            *d_yre, *d_yim, //output device arrays
            *h_xre_pl, *h_xim_pl, //page locked input host arrays
            *h_yre_pl, *h_yim_pl; //page locked output host arrays
} IOdataPtr;

__device__ void computation(int currentComputation, IOdataPtr ptr) //actual computation kernel
{
    int index;

    for (int i = 0; i < d_initVals.n_lines * PARAMETERS_PER_LINE; i++) {
        index = currentComputation * d_initVals.n_lines * PARAMETERS_PER_LINE + i;
        ptr.d_yre[index] = ptr.d_xre[index];
        ptr.d_yim[index] = ptr.d_xim[index];
    }
}

I would expect it to be able to compile and run the same way it does when I give the initVals struct as an argument to the kernel

How is that duplicate related. The `__constant__` structure in your question contains no pointers — talonmies, Aug 12 '19 at 14:03

score -1 · Answer 1 · edited Aug 14 '19 at 10:44

Reading your code, it's unclear to me what you're trying to do. But your question was "I want to copy a set of initalization values that every thread uses into global memory", so I'm going to choose to answer that question in a very direct way.Data is copied from the host to device via the cudaMemcpy functions. A worked-out example is below.

The struct:

typedef struct
{
    unsigned int voxels_x;
    unsigned int voxels_y;
    unsigned int voxels_z;

    // Input and output data amounts
    unsigned int n_lines;
    unsigned int TD_samples;

    // amount of total calculations
    unsigned int n_calc;
} initValsStruct;

Initialize it on the host and copy it to the device with cudaMemcpy:

int main(void) {
    initValsStruct h_params;
    initValsStruct *d_params;
    h_params.n_calc = 10;
    // etc. initialization

    // Copy struct to device
    cudaMemcpy(d_params, &h_params, sizeof(initValsStruct), cudaMemcpyHostToDevice);

    // Struct d_params now has whatever values were in h_params. 
    // Unlike this example, be sure to use proper error-checking 
    // for all CUDA API calls

    // some kernel calls

    // done
    return 0;
}

You could also use cudaMallocManaged, which is convenient and a little cleaner. I highly recommend it.

Your kernel calls should be using a initValsStruct pointer in their function signatures.

__device__ void computation(int currentComputation, initValsStruct *ptr, IOdataPtr *ptr) //actual computation kernel
{
    // do something
}

This puts your structure into global memory, where it's usable by any device function receiving a pointer to it. Your code seems to be trying to use the __constant__ keyword, suggesting that you're attempting to use the device-side constant cache. I recommend trying to use global memory first, to work out how to use the basic features of the CUDA API, and then delve into using the constant cache. Your struct has some default values (e.g. dynamic initialization), which is forbidden; redefine your struct without any dynamic initialization, as I've done above, then initialize the struct on the host first, then use cudaMemcpyToSymbol.

That host code won't work, it tries to copy into an uninitialized pointer. And there is no kernel in either the question or your answer — talonmies, Aug 13 '19 at 08:01
The answer to the question implies initialization in the comment block. The point of the answer is that the question-asker should be focusing on using global memory first, using cudaMemcpy, before moving on to more advanced topics like the constant cache — Michael, Aug 13 '19 at 23:02

How do I copy a single struct into __global__ memory?

1 Answers1

How do I copy a single struct into global memory?