34

I have recently come across Howard Hinnant's short_alloc and this is the single best example of custom allocators I have seen.

But as I spent more time studying the code to integrate it in my personal project, it occurred to me that the arena class, providing the stack-based allocations, might not always return properly aligned memory. In fact, I fear that only the first allocation is guaranteed to be suitably aligned (as the buffer itself has a forced alignment), see below relevant code fragments:

template <std::size_t N>
class arena
{
  static const std::size_t alignment = 16;
  alignas(alignment) char buf_[N];
  char* ptr_;
  //...
};

template <std::size_t N>
char*
arena<N>::allocate(std::size_t n)
{
  assert(pointer_in_buffer(ptr_) && "short_alloc has outlived arena");
  if (buf_ + N - ptr_ >= n)
  {
    char* r = ptr_;
    ptr_ += n;
    return r;
  }
  return static_cast<char*>(::operator new(n));
}

I can think of a few ways to fix this (at the cost of some memory wastage), the easiest being to round the size in the allocate/deallocate function to a multiple of alignment.

But before changing anything, I would like to make sure that I am not missing something here...

pragmatib
  • 443
  • 5
  • 7
  • 9
    Paging doctor [@HowardHinnant](http://stackoverflow.com/users/576911/howard-hinnant) – sehe Nov 15 '15 at 21:27

1 Answers1

42

This code was written before I had std::max_align_t in my toolbox (which now lives in <cstddef>). I would now write this as:

static const std::size_t alignment = alignof(std::max_align_t);

which on my system is exactly equivalent to the current code, but now more portable. This is the alignment which new and malloc are guaranteed to return. And once you have this "maximally aligned" buffer, you can put an array of any one type in it. But you can't use the same arena for different types (at least not different types that have different alignment requirements). And for that reason, perhaps it would be best to template arena on a second size_t, which is equal to the the alignof(T). In that way you can prevent the same arena from being accidentally used by types with differing alignment requirements:

arena<N, alignof(T)>& a_;

Assuming each allocation from arena has the same alignment requirements, and assuming the buffer is maximally aligned, then every allocation from the buffer will be suitably aligned for T.

E.g. on my system alignof(std::max_align_t) == 16. A buffer with this alignment can hold arrays of:

  • Types with alignof == 1.
  • Types with alignof == 2.
  • Types with alignof == 4.
  • Types with alignof == 8.
  • Types with alignof == 16.

As some environment may support types that have "super alignment" requirements, an added safety precaution would be to add (say within short_alloc):

static_assert(alignof(T) <= alignof(std::max_align_t), "");

If you are super paranoid you could also check that alignof(T) is a power of 2, though the C++ standard itself guarantees that this will always be true ([basic.align]/p4).

Update

I have taken a closer look at this problem and believe that rounding the requested allocation size up to the next alignment (as the OP suggested) is the best solution. I have updated "short_alloc" to do that on my website.

template <std::size_t N>
char*
arena<N>::allocate(std::size_t n)
{
    assert(pointer_in_buffer(ptr_) && "short_alloc has outlived arena");
    n = align_up(n);
    if (buf_ + N - ptr_ >= n)
    {
        char* r = ptr_;
        ptr_ += n;
        return r;
    }
    return static_cast<char*>(::operator new(n));
}

For special situations where you know that you don't need maximally aligned allocations (e.g. vector<unsigned char>), one can simply tweak alignment appropriately. And one could also have short_alloc::allocate pass alignof(T) to arena::allocate and assert(requested_align <= alignment)

template <std::size_t N>
char*
arena<N>::allocate(std::size_t n, std::size_t requested_align)
{
    assert(requested_align <= alignment);
    assert(pointer_in_buffer(ptr_) && "short_alloc has outlived arena");
    n = align_up(n);
    if (buf_ + N - ptr_ >= n)
    {
        char* r = ptr_;
        ptr_ += n;
        return r;
    }
    return static_cast<char*>(::operator new(n));
}

This would give you confidence that if you adjusted alignment downward, you didn't adjust it too far downward.

Update Again!

I've updated the description and code of this allocator quite a bit because of this excellent question (I had neglected this code for years).

The alignment check mentioned in the previous update is now done at compile-time (compile-time errors are always superior to run-time errors, even asserts).

Both the arena and short_alloc is now templated on alignment so that you can easily customize the alignment requirements you anticipate (and if you guess too low it is caught at compile-time). This template parameter is defaulted to alignof(std::max_align_t).

The arena::allocate function now looks like:

template <std::size_t N, std::size_t alignment>
template <std::size_t ReqAlign>
char*
arena<N, alignment>::allocate(std::size_t n)
{
    static_assert(ReqAlign <= alignment, "alignment is too small for this arena");
    assert(pointer_in_buffer(ptr_) && "short_alloc has outlived arena");
    auto const aligned_n = align_up(n);
    if (buf_ + N - ptr_ >= aligned_n)
    {
        char* r = ptr_;
        ptr_ += aligned_n;
        return r;
    }
    return static_cast<char*>(::operator new(n));
}

Thanks to alias templates, this allocator is easier to use than ever. For example:

// Create a vector<T> template with a small buffer of 200 bytes.
//   Note for vector it is possible to reduce the alignment requirements
//   down to alignof(T) because vector doesn't allocate anything but T's.
//   And if we're wrong about that guess, it is a comple-time error, not
//   a run time error.
template <class T, std::size_t BufSize = 200>
using SmallVector = std::vector<T, short_alloc<T, BufSize, alignof(T)>>;

// Create the stack-based arena from which to allocate
SmallVector<int>::allocator_type::arena_type a;
// Create the vector which uses that arena.
SmallVector<int> v{a};

This isn't necessarily the final word in such allocators. But hopefully this is a solid foundation upon which you can build your custom allocators.

Howard Hinnant
  • 206,506
  • 52
  • 449
  • 577
  • Thank you very much for the detailed answer. The problem I had observed was exactly that: "the same arena being accidentally used by types with differing alignment requirements". So far I have "fixed" it by rounding up the size of [de]allocations to multiples of `max_align_t`. I like your suggestion to add the alignment as a template parameter of `arena` better, but am not sure how this will work in combination with `short_alloc::rebind`. i.e. won't there be cases where the underlying `arena` can no longer be shared by two related allocators? Anyway, I will give it a try. Thanks again. – pragmatib Dec 12 '15 at 14:37
  • @pragmatib: Your fix is better than mine. I've adjusted my answer and updated my website. – Howard Hinnant Dec 12 '15 at 17:47
  • cool, thanks for the updated code. In the `allocate` method, shouldn't you introduce a temporary variable to hold the aligned-up value? Without it, looks to me that the fall-back to `::operator new` might request a bit too much memory. – pragmatib Dec 13 '15 at 17:04
  • @pragmatib: That's not a bad suggestion, thanks. And thanks to you I've also continued to update `short_alloc` over the past day to bring it up to what I consider modern standards. My answer is already obsolete again (compared to my website). The biggest change is that I'm now checking `requested_align <= alignment` at compile time. I'm also trying to make it more convenient to request a reduced-alignment allocator/arena. When I quit changing things I will update this answer again. – Howard Hinnant Dec 13 '15 at 17:31
  • Wow, this is an awesome answer. I will need a bit more time to digest the details but it looks great. FWIW, I had been toying with another idea: support 2 two different alignments in a single `arena`. An "optimized" one, chosen upon first request, whose allocations would be reserved from the head of the buffer, and `max_align_t` for which the allocations would be reserved from the tail. I have not progressed much on the implementation, and I'm not sure that it's worth the added complexity compared to your new version. Thanks for all your feedback and expert advice - much appreciated. – pragmatib Dec 14 '15 at 15:26
  • 1
    @HowardHinnant please add license information to your code so it can be safely used. Most projects prohibit the use of unlicensed code, even open-source projects. – Eloff Dec 26 '15 at 22:14
  • while using this with unordered_map on MSVC 2017 (C++14),memuse() prints the following: memory = 0 alloc = 0 memory = 0 alloc = 0 memory = 24 alloc = 1 memory = 48 alloc = 2 memory = 48 alloc = 2 1 2 3 4 – lalitm Mar 19 '18 at 09:07
  • 1
    Crazy idea: can the templated implementation automatically pick the correct alignment (i.e. the minimum power-of-2 that will work), based on SFINAE and something like https://stackoverflow.com/questions/49079196/sfinae-to-assert-that-code-does-not-compile ? That is: try alignment=1, then if THIS_CODE_DOES_NOT_COMPILE(...some sample code that creates a container and inserts an item...), then try (via template recursion) alignment=alignment*2, etc. with recursion base case set to aligment=alignof(std::max_align_t). – Don Hatch Apr 05 '20 at 03:38
  • 1
    Suggestion for making the "this allocator is easier to use than ever" part even easier: `template struct HasA { WhatIHas what_i_has; }; template struct ShortContainer : private HasA, public Container { ShortContainer(): Container(this->what_i_has) {} }; template using ShortVector = ShortContainer>>;` Then it can be used as simply `ShortVector v;` or `ShortVector v;` – Don Hatch Apr 07 '20 at 01:50
  • 1
    Also, worth noting: when using std::vector with this allocator, one should *always* call reserve(BufSizeBytes/sizeof(T)) immediately, to avoid unexpected silent spillover to the heap. This becomes apparent if we use a simpler variant of short_alloc that hard-errs instead of spilling over to the heap. The SmallVector type should perhaps do such a reserve() automatically in its ctor (but unfortunately that might muddy the exposition of this answer a bit, so maybe just call this out briefly at the end, instead). – Don Hatch Apr 07 '20 at 02:02
  • 1
    One more suggestion for the SmallVector example: when I see `SmallVector`, I immediately think it's talking about 1000 ints; I (and perhaps others?) find it surprising that it's really talking about 1000/sizeof(int) ints. Perhaps it would be less surprising if that template param was MaxItems rather than BufSizeBytes? – Don Hatch Apr 07 '20 at 02:26