How to implement RAII of CUDA API type cudaEvent_t using shared_ptr

Question

The CUDA API has types that require create() and destroy() calls analogous to memory allocation new and delete. In the spirit of RAII, and rather than having to call cudaEventCreate( &event) and cudaEventDestory( event ), I wrote the following wrapper for cudaEvent_t.

My Question: Is this acceptable code without any obvious errors?

It builds for me and I've yet to discover a problem. But I particularly do not like the reinterpret_cast<> trickery used to get the cudaEvent_t variable through the custom Allocater and Deleter for the shared_ptr.

Some related posts:

CUDA: Wrapping device memory allocation in C++

Is there a better/cleaner/more elegant way to malloc and free in cuda?

class CudaEvent {
private:
    struct Deleter {
        void operator()(cudaEvent_t * ptr) const {
            checkCudaErrors( cudaEventDestroy( reinterpret_cast<cudaEvent_t>(ptr) ));
        }
    };

    shared_ptr<cudaEvent_t> Allocate( ){
        cudaEvent_t event;
        checkCudaErrors( cudaEventCreate( &event ) );
        shared_ptr<cudaEvent_t> p( reinterpret_cast<cudaEvent_t*>(event), Deleter() );
        return p;
    }

    shared_ptr<cudaEvent_t> ps;

public:
    cudaEvent_t event;

    CudaEvent(  )
    : ps( Allocate( ) ),
      event( *(ps.get()) )
    {   }
};

As with any class which encapsulates runtime API functions in the constructor/destructor actions, you stand the risk of uncontrollable runtime API errors if you use this class at an inappropriate scope. — talonmies, Feb 21 '18 at 17:45
It is worth noting that the cudaEvent_t type can be passed by values, so you seem to be redoing something already done in the CUDA runtime. Besides, CUDA events can be destroyed, right after using it for the last time and without any wait, so this is probably not the most appropriate use of CUDA events ... — cedric, Feb 09 '23 at 09:39

einpoklum · Accepted Answer · 2018-02-22T13:52:28.637

You're conflating two independent mechanisms: A RAII class for CUDA events, and lifetime management using a shared pointer. These should be quite separate.

Another issue is that it's not clear what your "checkCudaErrors" is supposed to do.

A last issue is the one talonmies mentioned, which is what were to happen if you get the scope/lifetime wrong. For example - you've reset the device before the last reference to this event has been released. Or - you enqueue this event on a stream, then drop the point to it. So you're not really guaranteed safety by using the shared pointer - you'll have to keep track of things just as you would if you only had the id. In fact, this might make things even more difficult.

Finally, note that you can use the CUDA runtime API with modern-C++ wrappers, which, specifically, use RAII rather than createXYZ() and destroyXYZ():

https://github.com/eyalroz/cuda-api-wrappers

Specifically, you can have a look at:

The cuda::event_t class Doxygen documentation.
an example program using and managing events.

Due disclosure: I'm the author of this library.

score 1 · Answer 2 · answered Dec 09 '21 at 04:32

You are assuming cudaEvent_t is interconvertible with (void *) here:

shared_ptr<cudaEvent_t> Allocate( ){
    cudaEvent_t event;
    checkCudaErrors( cudaEventCreate( &event ) );
    shared_ptr<cudaEvent_t> p( reinterpret_cast<cudaEvent_t*>(event), Deleter() );
    return p;
}

The "correct" cast would be even worse, since it create a dangling reference just as surely as int *Allocate() { int i; return &i; }.

The proper C++ pattern is to associate cudaEvent_t with the lifetime of a class (and implement move/copy constructors) or else to directly use a shared pointer pointing to the cudaEvent_t.

std::shared_ptr<cudaEvent_t> event (
                    new cudaEvent_t,
                    [](cudaEvent_t *e){ cudaEventDestroy(*e); });
cudaEventCreate( event.get() );

How to implement RAII of CUDA API type cudaEvent_t using shared_ptr

2 Answers2