I have a GPUMatrix
class with data
allocated using cudaMallocManaged
:
class GPUMatrix
{
public:
GPUMatrix() { };
GPUMatrix(int rows, int cols, unsigned flags = 0) { cudaMallocManaged(data) ... };
~GPUMatrix() { cudaFree(data) ... };
public:
int rows = 0;
int cols = 0;
float *data = nullptr;
};
Only the data
pointer is accessible by the GPU. I therefore define my mat mul kernel like this (it takes a copy of the objects):
__global__
void MatMulNaiveKernelMat(const GPUMatrix a, const GPUMatrix b, const GPUMatrix c)...
Upon finishing it however calls ~GPUMatrix()
and releases the memory. What is the best way to deal with this? I cannot pass a pointer or reference to GPUMatrix
to the kernel since the entire object is not allocated by cudaMallocManaged
, only the data
element is.