2

I have a GPUMatrix class with data allocated using cudaMallocManaged:

class GPUMatrix
{
public:
    GPUMatrix() { };
    GPUMatrix(int rows, int cols, unsigned flags = 0) { cudaMallocManaged(data) ... };
    ~GPUMatrix() { cudaFree(data) ... };

public:
    int rows = 0;
    int cols = 0;
    float *data = nullptr;
};

Only the data pointer is accessible by the GPU. I therefore define my mat mul kernel like this (it takes a copy of the objects):

__global__
void MatMulNaiveKernelMat(const GPUMatrix a, const GPUMatrix b, const GPUMatrix c)...

Upon finishing it however calls ~GPUMatrix() and releases the memory. What is the best way to deal with this? I cannot pass a pointer or reference to GPUMatrix to the kernel since the entire object is not allocated by cudaMallocManaged, only the data element is.

pseudomarvin
  • 1,477
  • 2
  • 17
  • 32
  • Don't call CUDA APIs in the default constructor or destructor – talonmies Oct 27 '17 at 17:45
  • Of course I could do that, but then I would have to have an `Init`, `Release` pairs of methods right? And have to remember to call them or risk memory leaks which seems inconvenient. Or are you suggesting something else? – pseudomarvin Oct 29 '17 at 15:18

1 Answers1

2

Your destructor always deletes the data pointer. However, the default copy constructor will have a copy of the original object's data pointer that it must not delete.

One way to fix this is to modify your class to hold a flag that says if the data pointer is owned by the class and needs to be deleted. Then define a copy constructor that sets that flag appropriately.

There are potential issues with this method if the copy outlives the original object, and the move constructor should be added as well. Then there's the copy assignment and move assignment operators. See this answer for more information.

1201ProgramAlarm
  • 32,384
  • 7
  • 42
  • 56