9

I have different memory allocators in my code: One for CUDA (managed or not), one for pure host memory. I could also imagine a situation when you want to use different allocation algorithms - one for large, long living blocks for example and another one for short living, small objects.

I wonder how to implement such a system properly.

Placement new?

My current solution uses placement new, where the pointer decides which memory and memory allocator to use. Care must then be taken when deleting/de-allocating the objects. Currently, it works, but I think it's not a nice solution.

MyObj* cudaObj = new(allocateCudaMemoryField(sizeof(MyObj)) MyObj(arg1, arg2);
MyObj* hostObj = new(allocateHostMemoryField(sizeof(MyObj)) MyObj(arg1, arg2);

Overload new, but how?

I'd like to go for a solution with an overloaded new operator. Something that will look as follows:

MyObj* cudaObj = CudaAllocator::new MyObj(arg1, arg2);
MyObj* hostObj = HostAllocator::new MyObj(arg1, arg2);
CudaAllocator::delete cudaObj;
HostAllocator::delete hostObj;

I think I could achieve this by having a namespace CudaAllocator and HostAllocator, each with an overloaded new and delete.

Two questions:

  • Is it reasonable to have different overloads of new in a code or is this a sign for a design flaw?
  • If it's ok, how to implement it best?
Michael
  • 7,407
  • 8
  • 41
  • 84
  • 1
    Think first about the objects you don't directly allocate yourself, such as those inside containers such as std::vector. Typically, when overloading new, you want to associate the overload with the type being allocated, as you will find in many online examples, so a MyObj allocated inside a container gets the same allocator as one you allocate yourself. Your idea **may** be a better fit for **your** goals. You gave an example of the same type allocated with different allocators. I'm just saying you should think through the harder cases before deciding. – JSF Jul 09 '15 at 12:15
  • 1
    CUDA custom memory allocators are pretty much a solved problem in the CUDA thrust library (see [here](http://stackoverflow.com/q/9007343/681865) for example). It might be just as easy to use the existing thrust infrastructure as to (re)invent your own. – talonmies Jul 09 '15 at 13:06
  • The idea with an inherited custom allocator seems good, but I think the thrust library itself does not help here. I have to make objects available on the GPU and I do not see how thrust would help me to do so in a way which is not already covered by CUDA's managed memory. – Michael Jul 09 '15 at 13:53
  • You should have a look at http://stackoverflow.com/questions/6210921/operator-new-inside-namespace. An allocation function must be in the global or class scope. – Coder Jul 22 '15 at 12:58

1 Answers1

3

There is a time and place for overloading operator new/delete, but it is generally preferred only when simpler measures have been exhausted.

The main disadvantage of placement new is that it requires the caller to "remember" how the object was allocated and take the appropriate action to invoke the corresponding de-allocation when that object has reached the end of its lifespan. Additionally, requiring the caller to invoke placement new is syntactically burdensome (I presume this is the "not a nice solution" you mention.)

The main disadvantage to overloading new/delete is that it is meant to be done once for a given type (as @JSF pointed out). This tightly couples an object to the way it is allocated/deallocated.

Overloaded new/delete

Presuming this set up:

#include <memory>
#include <iostream>

void* allocateCudaMemoryField(size_t size)
{
   std::cout << "allocateCudaMemoryField" << std::endl;
   return new char[size]; // simulated
}
void* allocateHostMemoryField(size_t size)
{
   std::cout << "allocateHostMemoryField" << std::endl;
   return new char[size];
}
void deallocateCudaMemoryField(void* ptr, size_t)
{
   std::cout << "deallocateCudaMemoryField" << std::endl;
   delete ptr; // simulated
}
void deallocateHostMemoryField(void* ptr, size_t)
{
   std::cout << "deallocateHostMemoryField" << std::endl;
   delete ptr;
}

Here's MyObj with overloaded new/delete (your question):

struct MyObj
{
   MyObj(int arg1, int arg2)
   {
      cout << "MyObj()" << endl;
   }
   ~MyObj()
   {
      cout << "~MyObj()" << endl;
   }
   static void* operator new(size_t)
   {
      cout << "MyObj::new" << endl;
      return ::operator new(sizeof(MyObj));
   }
   static void operator delete(void* ptr)
   {
      cout << "MyObj::delete" << endl;
      ::operator delete(ptr);
   }
};

MyObj* const ptr = new MyObj(1, 2);
delete ptr;

Prints the following:

MyObj::new
MyObj()
~MyObj()
MyObj::delete

C Plus Plusy Solution

A better solution might be to use RAII pointer types combined with a factory to hide the details of allocation and deallocation from the caller. This solution uses placement new, but handles deallocation by attaching a deleter callback method to a unique_ptr.

class MyObjFactory
{
public:
   static auto MakeCudaObj(int arg1, int arg2)
   {
      constexpr const size_t size = sizeof(MyObj);
      MyObj* const ptr = new (allocateCudaMemoryField(size)) MyObj(arg1, arg2);
      return std::unique_ptr <MyObj, decltype(&deallocateCudaObj)> (ptr, deallocateCudaObj);
   }
   static auto MakeHostObj(int arg1, int arg2)
   {
      constexpr const size_t size = sizeof(MyObj);
      MyObj* const ptr = new (allocateHostMemoryField(size)) MyObj(arg1, arg2);
      return std::unique_ptr <MyObj, decltype(&deallocateHostObj)> (ptr, deallocateHostObj);
   }

private:
   static void deallocateCudaObj(MyObj* ptr) noexcept
   {
      ptr->~MyObj();
      deallocateCudaMemoryField(ptr, sizeof(MyObj));
   }
   static void deallocateHostObj(MyObj* ptr) noexcept
   {
      ptr->~MyObj();
      deallocateHostMemoryField(ptr, sizeof(MyObj));
   }
};

{
  auto objCuda = MyObjFactory::MakeCudaObj(1, 2);
  auto objHost = MyObjFactory::MakeHostObj(1, 2);
}

Prints:

allocateCudaMemoryField
MyObj()
allocateHostMemoryField
MyObj()
~MyObj()
deallocateHostMemoryField
~MyObj()
deallocateCudaMemoryField

Generic Version

This gets better. With this same strategy, we can handle the allocation/deallocation semantics for any class.

class Factory
{
public:
   // Generic versions that don't care what kind object is being allocated
   template <class T, class... Args>
   static auto MakeCuda(Args... args)
   {
      constexpr const size_t size = sizeof(T);
      T* const ptr = new (allocateCudaMemoryField(size)) T(args...);
      using Deleter = void(*)(T*);
      using Ptr = std::unique_ptr <T, Deleter>;
      return Ptr(ptr, deallocateCuda <T>);
   }
   template <class T, class... Args>
   static auto MakeHost(Args... args)
   {
      constexpr const size_t size = sizeof(T);
      T* const ptr = new (allocateHostMemoryField(size)) T(args...);
      using Deleter = void(*)(T*);
      using Ptr = std::unique_ptr <T, Deleter>;
      return Ptr(ptr, deallocateHost <T>);
   }

private:
   template <class T>
   static void deallocateCuda(T* ptr) noexcept
   {
      ptr->~T();
      deallocateCudaMemoryField(ptr, sizeof(T));
   }
   template <class T>
   static void deallocateHost(T* ptr) noexcept
   {
      ptr->~T();
      deallocateHostMemoryField(ptr, sizeof(T));
   }
};

Used with a new class S:

struct S
{
   S(int x, int y, int z) : x(x), y(y), z(z)
   {
      cout << "S()" << endl;
   }
   ~S()
   {
      cout << "~S()" << endl;
   }
   int x, y, z;
};
{
   auto objCuda = Factory::MakeCuda <S>(1, 2, 3);
   auto objHost = Factory::MakeHost <S>(1, 2, 3);
}

Prints:

allocateCudaMemoryField
S()
allocateHostMemoryField
S()
~S()
deallocateHostMemoryField
~S()
deallocateCudaMemoryField

I didn't want to crank the templating full blast, but obviously that code is ripe for DRYing out (parameterize the implementations on allocator function).

Considerations

This works out pretty well when your objects are relatively large and not allocated/deallocated too frequently. I wouldn't use this if you have millions of objects coming and going every second.

Some of the same strategies work, but you want to also consider tactics like

  • bulk allocation/deallocation at the beginning/end of a processing stage
  • object pools that maintain a free list
  • C++ allocator objects for containers like vector
  • etc.

It really depends on your needs.

tl;dr

No. Don't overload new/delete in this situation. Build an allocator that delegates to your generic memory allocators.

wesholler
  • 122
  • 4
  • If you want to use this object in a STL container, you're out of luck with this solution (unless you make a container of pointers and perform the memory allocation/deallocation yourself) – xryl669 Feb 28 '22 at 17:26