i am quite new to CUDA and I have a question regarding the memory management for an object. I have an object function to load the data to the device and if another object function is called the computation is carried out.
I have read some parts of the NVIDIA programming guide and some SO questions but they do data copying and computing in a single function so there no need of multiple functions.
Some more specifications: The data is read one time. I do not know the data size at compile time therefore I need a dynamic allocation. My current device has a compute capability of 2.1 (will be updated soon to 6.1).
I want to copy the data in a first function and use the data in a different function. For example:
__constant__ int dev_size;
__device__ float* dev_data; //<- not sure about this
/* kernel */
__global__ void computeSomething(float* dev_output)
{
int idx = blockIdx.x * blockDim.x + threadIdx.x;
if (idx < dev_size)
{
dev_output[idx] = dev_data[idx]*100; // some computation;
}
}
// function 1
void OBJECT::copyVolumeToGPU(int size, float* data)
{
cudaMalloc(&dev_data, size * sizeof(float));
cudaMemcpy(dev_data, data, size * sizeof(float), cudaMemcpyHostToDevice );
cudaMemcpyToSymbol(dev_size, size, sizeof(int));
}
// function 2
void OBJECT::computeSmthOnDevice(int size)
{
// allocate output array
auto host_output = new float[size];
float* dev_output;
cudaMalloc(&dev_output, size * sizeof(float));
int block = 256;
int grid = ceil(size/block);
computeSomething<<<grid,block>>>(dev_output);
cudaMemcpy(host_output, dev_data, size * sizeof(float), cudaMemcpyDeviceToHost);
/* ... do something with output ... */
delete[] host_output;
cudaFree(dev_output);
}
gpuErrChk is carried out this way: https://stackoverflow.com/a/14038590/3921660 but omitted in this example.
Can I copy the data using a __device__
pointer (like __device__ float* dev_data;
)?