Is there a better/cleaner/more elegant way to malloc and free in cuda?

Question

I am trying to cudaMalloc a bunch of device pointers, and gracefully exit if any of the mallocs didn't work. I have functioning code - but bloated because I have to cudaFree everything I'd previously malloc'd if one fails. So now I am wondering if there is a more succinct method of accomplishing this. Obviously I can't free something that hasn't been malloc'd - that will definitely cause problems.

Below is the snippet of code I am trying to make more elegant.

    //define device pointers
    float d_norm, *d_dut, *d_stdt, *d_gamma, *d_zeta;

    //allocate space on the device for the vectors and answer
    if (cudaMalloc(&d_norm, sizeof(float)*vSize) != cudaSuccess) {
            std::cout << "failed malloc";
            return;
    };

    if (cudaMalloc(&d_data, sizeof(float)*vSize) != cudaSuccess) {
            std::cout << "failed malloc";
            cudaFree(d_norm);
            return;
    };

    if (cudaMalloc(&d_stdt, sizeof(float)*wSize) != cudaSuccess) {
            std::cout << "failed malloc";
            cudaFree(d_norm);
            cudaFree(d_data);
            return;
    };

    if (cudaMalloc(&d_gamma, sizeof(float)*vSize) != cudaSuccess) {
            std::cout << "failed malloc";
            cudaFree(d_norm);
            cudaFree(d_dut);
            cudaFree(d_stdt);
            return;
    };

    if (cudaMalloc(&d_zeta, sizeof(float)*w) != cudaSuccess) {
            std::cout << "failed malloc";
            cudaFree(d_norm);
            cudaFree(d_dut);
            cudaFree(d_stdt);
            cudaFree(d_gamma);
            return;
    };

This is a shortened version, but you can see how it just keeps building. In reality I am trying to malloc about 15 arrays. It starts getting ugly - but it works correctly.

Thoughts?

`goto` doesn't help with keeping track of the list of pointers that have been allocated vs. those that haven't — Robert Crovella, Sep 08 '16 at 16:08
one simple way: add all items to vector and free them at the end — apple apple, Sep 08 '16 at 16:09
Or you can use RAII and just return? (use something like unique_ptr) — apple apple, Sep 08 '16 at 16:10
There is good example of what I mean: http://stackoverflow.com/questions/788903/valid-use-of-goto-for-error-management-in-c But if it's C++ of course RAII is much better solution. Sorry for misleading comment. — magras, Sep 08 '16 at 16:11
@RobertCrovella Not saying that it is the best solution, but it is simple to overcome if you initialize all pointers to NULL to begin with. — Pavan Yalamanchili, Sep 08 '16 at 19:43
@PavanYalamanchili I don't recall suggesting otherwise. The question suggests a need to keep track of a list of pointers that have already been allocated. `goto` doesn't help with that, in my view. In fact, the null pointer approach is already offered as an answer. The answer I've provided doesn't require keeping track of anything. You don't even need to know the list of pointers to perform `cudaFree()` on. — Robert Crovella, Sep 08 '16 at 19:56

score 4 · Accepted Answer · answered Sep 08 '16 at 16:07

Some possibilities:

cudaDeviceReset() will free all device allocations, without you having to run through a list of pointers.
if you intend to exit (the application), all device allocations are freed automatically upon application termination anyway. The cuda runtime detects the termination of the process associated with an application's device context, and wipes that context at that point. So if you're just going to exit, it should be safe to not perform any cudaFree() operations.

apple apple · Answer 2 · 2016-09-09T12:48:15.257

You can wrap them into unique_ptr with custom deleter. (c++11)
Or just add to one vector when success allocate and free all pointers in the vector.

example about unique_ptr:

#include <iostream>
#include <memory>
using namespace std;

void nativeFree(float* p);
float* nativeAlloc(float value);

class NativePointerDeleter{
public:
   void operator()(float* p)const{nativeFree(p);}
};


int main(){
   using pointer_type = unique_ptr<float,decltype(&nativeFree)>;
   using pointer_type_2 = unique_ptr<float,NativePointerDeleter>;

   pointer_type ptr(nativeAlloc(1),nativeFree);
   if(!ptr)return 0;

   pointer_type_2 ptr2(nativeAlloc(2));//no need to provide deleter
   if(!ptr2)return 0;

   pointer_type ptr3(nullptr,nativeFree);//simulate a fail alloc
   if(!ptr3)return 0;

   /*Do Some Work*/

   //now one can return without care about all the pointers
   return 0;
}

void nativeFree(float* p){
   cout << "release " << *p << '\n';
   delete p;
}
float* nativeAlloc(float value){
   return new float(value);
}

May be [this answer](http://stackoverflow.com/a/19054467/5980430) helps you with unique_ptr — apple apple, Sep 08 '16 at 16:20
Perhaps add a small example demonstrating `unique_ptr` ? Keep in mind this also requires a C++11 compiler. — Pavan Yalamanchili, Sep 08 '16 at 19:44

score 2 · Answer 3 · answered Sep 08 '16 at 16:21

Initially store nullptr in all pointers. free takes no effect on a null pointer.

int* p1 = nullptr;
int* p2 = nullptr;
int* p3 = nullptr;

if (!(p1 = allocate()))
  goto EXIT_BLOCK;
if (!(p2 = allocate()))
  goto EXIT_BLOCK;
if (!(p3 = allocate()))
  goto EXIT_BLOCK;

EXIT_BLOCK:
free(p3); free(p2); free(p1);

score 2 · Answer 4 · answered Sep 09 '16 at 16:25

_{Question is tagged C++, so here is a C++ solution}

The general practice is to acquire resources in constructor and to release in destructor. The idea is that in any circumstances resource is guaranteed to be released by a call to destructor. Neat side effect is that destructor is called automagically in the end of the scope, so you don't need to do anything at all for resource to be released when it's no longer used. See RAII

In the role of resource one might have various memory types, file handles, sockets etc. CUDA device memory is no exception from this general rule.

I would also discourage you from writing your own resource owning classes and would advice to use a library. thrust::device_vector is probably the most widely used device memory container. Thrust library is a part of CUDA toolkit.

score 1 · Answer 5 · answered Feb 22 '18 at 09:28

Yes. If you use (my) CUDA Modern-C++ API wrapper library, you could just use unique pointers, which will release when their lifetime ends. Your code will become merely the following:

auto current_device = cuda::device::current::get();
auto d_dut   = cuda::memory::device::make_unique<float[]>(current_device, vSize);
auto d_stdt  = cuda::memory::device::make_unique<float[]>(current_device, vSize);
auto d_gamma = cuda::memory::device::make_unique<float[]>(current_device, vSize);
auto d_zeta  = cuda::memory::device::make_unique<float[]>(current_device, vSize);

Note, though, that you could just allocate once and just place the other pointers at an appropriate offset.

Is there a better/cleaner/more elegant way to malloc and free in cuda?

5 Answers5

Linked