2

NVIDIA TensorRT objects such as nvinfer1::IRuntime and nvinfer1::ICudaEngine cannot be stored directly in a std::unique_ptr<>. Instead they have a destroy() method that must be called.

So to make this work, you must use a deleter like this:

#include <NvInfer.h>
#include <cuda.h>

template<typename T>
struct NVIDIADestroyer
{
    void operator()(T * t)
    {
        t->destroy();
    }
};

template<typename T>
using NVIDIAUniquePtr = std::unique_ptr<T, NVIDIADestroyer<T>>;

Instead of std::unique_ptr<T>, you then use NVIDIAUniquePtr<T>.

So far, this works fine. What I then attempted to do while cleaning up the code is replace the deleter with a lambda so I could skip defining the NVIDIADestroyer structure. But I couldn't figure out how to do this. My thought was something along these lines:

template<typename T>
using NVIDIAUniquePtr = std::unique_ptr<T, [](T * t)
{
    t->destroy();
}>;

But this results in the following error messages:

TRT.hpp:52:45: error: lambda-expression in template-argument
using NVIDIAUniquePtr = std::unique_ptr<T, [](T * t)
                                             ^
TRT.hpp:55:2: error: template argument 2 is invalid
  }>;
  ^

Is there a way to make this work?

einpoklum
  • 118,144
  • 57
  • 340
  • 684
Stéphane
  • 19,459
  • 24
  • 95
  • 136
  • This is answered by [this question](https://stackoverflow.com/q/42715492/364696), though it's not a duplicate of it (I'm sure there is one, just not finding it right now). – ShadowRanger Jan 28 '21 at 14:24
  • Why you think lambda approach would be better? Custom deleter solves problem in one place and you done, version with lambda you have to handle custom deleter every time you creating a `std::unique_ptr`. – Marek R Jan 28 '21 at 14:38
  • @MarekR I do want to use a custom deleter. I just thought there would be a way to do it without having to define a structure whose sole purpose is to help me define the custom deleter. – Stéphane Jan 28 '21 at 14:40
  • 2
    In C++20 you can have `template using NVIDIADestroyer = decltype([](T* t){ t->destroy(); });` – Caleth Jan 28 '21 at 15:26
  • 1
    C++ has been using a custom type for this for everything. (previously there was comparator) Before C++20 you could use function pointer as the type instead and pass the object in the pointer (except that it's not allowed for unique_ptr, only for comparator object) but using a type makes the optimizer easier to inline the code. In C++20, the lambda is just syntax sugar for defining a struct. tl;dr: The code you already have is idiomatic C++<20-code, don't worry. – user202729 Jan 28 '21 at 17:16
  • Use shared_ptr. It provides for a custom deleter. Create a stand-alone function and use std::make_shared, or better yet, create an allocator and use std::allocate_shared(). – Michaël Roy Jan 28 '21 at 17:17
  • 1
    @MichaëlRoy Unnecessary overhead. – user202729 Jan 28 '21 at 17:18
  • @user202729 I get your point... – Michaël Roy Jan 28 '21 at 17:23
  • @Caleth `error: lambda-expression in unevaluated context`. – Maxim Egorushkin Jan 28 '21 at 22:20
  • @MichaëlRoy I'd say never use `std::shared_ptr`, unless your design leaves you no other choice. `std::shared_ptr` is the worst smart pointer in terms of space and time overhead. – Maxim Egorushkin Jan 28 '21 at 22:30
  • @MaximEgorushkin Yes, but the cost of overhead is always dependent on where it is located,and how often if it used. How many calls per second doe sit take for the overhead of using shared_ptr outwieghs the benefits in code safety and maintenance costs? It's probably somewhere in the thousands. Engineering design is all about compromises. If performance was so important _everywhere_, we'd all be programming in assembler... Even user interfaces. And that wouldn't make any sense because of all the other things we need to do with outr code, like maintenance and upgrades, would it? – Michaël Roy Jan 29 '21 at 00:12
  • @MaximEgorushkin Neither does all of your thinking. We simply do do have anough details to take these decisions for the OP. – Michaël Roy Jan 29 '21 at 14:08
  • "NVIDIA objects such as `nvinfer1::IRuntime`" <- Note that these things are not in use in, nor offered by, the CUDA runtime API; nor any other NVIDIA software that I know of except NVIDIA TensorRT. – einpoklum Feb 01 '21 at 16:49

4 Answers4

5

Using a stateless deleter defined as a struct or class has zero run-time and space overhead since C++11, it can't get better than that.

Using function templates instead of class template for deleter removes the need to specify deleter class template arguments as well as having to include CUDA header files.

noexcept on deleter functions may result in smaller calling code. Because no compiler-generated stack unwinding code is necessary in the caller around noexcept calls. (GNU C++ standard library ~unique_ptr() is noexcept unconditionally, but the C++ standard doesn't require that. GNU C++ standard library probably does that for you for exactly the reason I stated. Too bad noexcept is not deduced and applied by the compiler automatically, for ABI stability reasons (#1 reason we cannot have good things in C++), which could theoretically be overridden with explicit user-provided noexcept specification, but that's a big subject on its own.)

Since C++17 a capture-less lambda closure can also be used as a deleter with zero overhead:

#include <memory>
#include <iostream>

// C++11
struct Deleter { template<class P> void operator()(P p) noexcept { p->destroy(); } };
template<class T> using P11 = std::unique_ptr<T, Deleter>;

// C++17
constexpr auto deleter = [](auto p) noexcept { p->destroy(); };
template<class T> using P17 = std::unique_ptr<T, decltype(deleter)>;

int main() {
    std::cout << sizeof(void*) << '\n';
    std::cout << sizeof(P11<void>) << '\n';
    std::cout << sizeof(P17<void>) << '\n';
}

Compiled with -std=c++17 outputs:

8
8
8
Maxim Egorushkin
  • 131,725
  • 17
  • 180
  • 271
0

A different approach would be wrapping your weird-deletion objects to make them behave. Perhaps something along these lines:

template <typename Wrapped>
struct raiified : Wrapped {

    template <typename... Args>
    raiified(Args&&... args) : Wrapped(std::forward<Args>(args)...) { }

    raiified(const Wrapped& wrapped) : Wrapped(wrapped) { }
    raiified(Wrapped&& wrapped) : Wrapped(wrapped) { }
    raiified(const raiified& other) : wrapped(other.wrapped) { }
    raiified( raiifiedd&& other) : Wrapped(std::move(other.wrapped)) { }

    // operator overloads?

    ~raiified() { wrapped.delete(); }
}

With that, you should be able to use std::unique_ptr<raiified<nvinfer1::IRuntime>>. Or maybe:

namespace infer {

template <typename T>
using unique_ptr = std::unique_ptr<raiified<T>>

}
einpoklum
  • 118,144
  • 57
  • 340
  • 684
0

It's hard to storage lambda function (closure) by specific TYPE like std::function, but we can storage it in heap-zone by manual-memory-management.

Note that the C++ standard is not guarantee the lambda function trivially copyable, so we cannot memcpy it directly and only can wrapper it.

Reference: asio/experimental/detail/channel_service.hpp:try_receive

See the pseudocode below:


class MyWrapper {

MyWrapper(MyWrapper&) = delete;

void *mem;

template<typename T>
void set(T&& token) { 
    mem = malloc(sizeof(token));
    new (mem) T(std::move(token));
}

template<typename T>
void release() {
    ((T*)mem)->~T();
    free(mem);
}

}; //end class

We must know the type T before construct/unconstruct this structure, and we can't avoid it, since no runtime type are support for C++. But, we can call the lambda without typename T, by binding an executor into this structure. (like std::bind, or just wrapper by lambda function)

vrqq
  • 448
  • 3
  • 8
-1

Here is the correct syntax for using a lambda function as a deleter, which was your original question.

#include <functional>
#include <memory>

template <typename T>
using NVIDIAUniquePtr = std::unique_ptr<T, std::function<void(T*)>>;

template <typename T, typename... Args>
NVIDIAUniquePtr<T> make_nvidia_unique(Args&&... args) {
  return NVIDIAUniquePtr<T>(
      new T(std::forward<Args>(args)...), [](T* p) {
        p->destroy();
        delete p;
      });
}

But, as pointed by many, this increases the size of your unique_ptr. This may or may not matter to you, but there is a better way...

Using a deleter object of size 0 will save 32 bytes per object. The templated move constructors in the deleter class are absolutely needed for moving a unique_ptr<T> into a unique_ptr of a base class of T, which means they are needed for any production code.

Maxim Egorushkin's solution is good, but not complete.

#include <iostream>
#include <memory>
#include <type_traits>

template <typename T>
struct nv_deleter {
  nv_deleter() noexcept = default;
  nv_deleter(nv_deleter&&) noexcept = default;

  template <typename U,
            typename = std::enable_if_t<std::is_convertible<U*, T*>::value>>
  nv_deleter(nv_deleter<U>&&) noexcept {}

  template <typename U,
            typename = std::enable_if_t<std::is_convertible<U*, T*>::value>>
  nv_deleter& operator=(nv_deleter<U>&&) noexcept {}

  // It is important that this does not throw.  If destroy() may throw, 
  // add a try/catch block.  
  void operator()(T* p) noexcept { 
    p->destroy();
    delete p;    // this is needed, if destroy() does not call delete, 
                 // you will end up with memory leaks 
  }
};

template <typename _T>
using NVIDIAUniquePtr = std::unique_ptr<_T, nv_deleter<_T>>;

template <typename T, typename... Args>
auto make_nvidia_unique(Args&&... args) {
  return NVIDIAUniquePtr<T>(new T(std::forward<Args>(args)...),
                            nv_deleter<T>{});
}

int main() {
  struct Base {
    virtual ~Base() noexcept {}
    virtual void destroy() noexcept {}  // it is preferable that this is noexcept,
                                        // but you may not have any control 
                                        // over external library code 
  };

  struct Derived : Base {};

  NVIDIAUniquePtr<Derived> pDerived = make_nvidia_unique<Derived>();

  // this is the case where the templated move constructors are necessary. 
  NVIDIAUniquePtr<Base> pBase = std::move(pDerived);

  std::cout << sizeof(pDerived) << '\n';  // prints sizeof(void*) -> 8
  std::cout << sizeof(pBase) << '\n';     // prints sizeof(void*) -> 8

  return 0;
}
Michaël Roy
  • 6,338
  • 1
  • 15
  • 19
  • 1
    Unfortunately this has a non-trivial cost associated with it. OPs type has a zero-sized deleter, so the overall size of their unique_ptr is that of a pointer. Using `std::function` adds another 32 bytes (or whatever your implementation wants for inline storage) to it to be copied around. It's "too strong" - we are only ever going to invoke one function, yet `std::function` can call anything. – GManNickG Jan 28 '21 at 18:47
  • 1
    Well, the OP did not seem concerned about these 32 bytes... Which will always be there when using a lambda. The alternative is to clone std::default_deleter<>, but it's time for me to close my laptop and have dinner :). In any case, I'm sure the encapsulated objects nvinfer1::IRuntime and nvinfer1::ICudaEngine are themselves quite larger than 32 bytes. – Michaël Roy Jan 28 '21 at 19:15
  • 1
    The lambda doesn't need to take space. Specializing `default_deleter` is never a good idea. – Potatoswatter Jan 28 '21 at 20:28
  • I didn't suggest specializing default_deleter<>. I suggested cloning it. – Michaël Roy Jan 29 '21 at 00:02