Questions about Hinnant's stack allocator

Question

I've been using Howard Hinnant's stack allocator and it works like a charm, but some details of the implementation are a little unclear to me.

Why are global operators new and delete used? The allocate() and deallocate() member functions use ::operator new and ::operator delete respectively. Similarly, the member function construct() uses the global placement new. Why not allow for any user-defined global or class-specific overloads?
Why is alignment set to hard-coded 16 bytes instead of std::alignment_of<T>?
Why do the constructors and max_size have a throw() exception specification? Isn't this discouraged (see e.g. More Effective C++ Item 14.)? Is it really necessary to terminate and abort when an exception occurs in the allocator? Does this change with the new C++11 noexcept keyword?
The construct() member function would be an ideal candidate for perfect forwarding (to the constructor that is being called). Is this the way to write C++11 conformant allocators?
What other changes are necessary to make the current code C++11 conformant?

`::new (p) T` guarantees you that you will construct a `T` and nothing else will happen. If a class wants to provide an allocation function that has the same signature as the usual global placement new, then presumably it does something more. Think of `::new (p) T` as an explicit constructor call, not as memory allocation (the latter of which makes sense to override. (Note that's it's not possible to override the usual global placement new.) — Luc Danton, Jul 25 '12 at 11:03
@LucDanton OK, so if a class has defined its own placement new (e.g. for logging purposes), this will still be called by `::new(p) T`? — TemplateRex, Jul 25 '12 at 11:07
@rhalbersma if you want to log that kind of thing, log in the constructor. That placement `new` is (unlike other forms of `new`) a primitive of the language, and that's why overriding it is very sketchy. — R. Martinho Fernandes, Jul 25 '12 at 11:09
No. And that class should just as well log in the constructor. The point of `void* operator new(std::size_t, void*);` is that it's a no-op. — Luc Danton, Jul 25 '12 at 11:10
For the alignment at least, Effective C++ (3rd Ed.) says (Item 50, p. 249) : "C++ requires that all `operator new`s return pointers that are suitably aligned for *any* data type. `malloc` labors under the same requirement." This usually means 16-byte aligned, so he is being consistent with that. Don't know if c11 and c++11 are the same, but it's likely. — BoBTFish, Jul 25 '12 at 11:10
@BoBTFish If I have a struct that is precisely one cacheline (64-bytes) long, how do I align a the data of a `std::vector` of such structs then to a 64-byte boundary? — TemplateRex, Jul 25 '12 at 11:13
@rhalbersma That `struct` can be safely aligned at a smaller boundary. However, if you want to *force* it at a 64-byte boundary, you'll have to do it yourself, but I can't say I know much about this. C++11 has some facilities for this. — BoBTFish, Jul 25 '12 at 11:23
To complement BoBTFish's comment, there's `alignas` for declaring aligned members, `std::aligned_storage` for aligned automatic raw storage, and `std::align` for aligned dynamically allocated raw storage. — R. Martinho Fernandes, Jul 25 '12 at 11:27

Howard Hinnant · Accepted Answer · 2014-11-20T23:39:16.940

I've been using Howard Hinnant's stack allocator and it works like a charm, but some details of the implementation are a little unclear to me.

Glad it's been working for you.

1. Why are global operators new and delete used? The allocate() and deallocate() member functions use ::operator new and ::operator delete respectively. Similarly, the member function construct() uses the global placement new. Why not allow for any user-defined global or class-specific overloads?

There's no particular reason. Feel free to modify this code in whatever way works best for you. This was meant to be more of an example, and it is by no means perfect. The only requirements are that the allocator and deallocator supply properly aligned memory, and that the construct member constructs an argument.

In C++11, the construct (and destroy) members are optional. I would encourage you to remove them from the allocator if you're operating in an environment that supplies allocator_traits. To find out, just remove them and see if things still compile.

2. Why is alignment set to hard-coded 16 bytes instead of std::alignment_of<T>?

std::alignment_of<T> would probably work fine. I was probably being paranoid that day.

3. Why do the constructors and max_size have a throw() exception specification? Isn't this discouraged (see e.g. More Effective C++ Item 14.)? Is it really necessary to terminate and abort when an exception occurs in the allocator? Does this change with the new C++11 noexcept keyword?

These members just won't ever throw. For C++11 I should update them to noexcept. In C++11 it becomes more important to decorate things with noexcept, especially special members. In C++11 one can detect whether an expression is nothrow or not. Code can branch depending on that answer. Code that is known to be nothrow is more likely to cause generic code to branch to a more efficient path. std::move_if_noexcept is the canonical example in C++11.

Don't use throw(type1, type2) ever. It has been deprecated in C++11.

Do use throw() when you really want to say: This will never throw, and if I'm wrong, terminate the program so I can debug it. throw() is also deprecated in C++11, but has a drop-in replacement: noexcept.

4. The construct() member function would be an ideal candidate for perfect forwarding (to the constructor that is being called). Is this the way to write C++11 conformant allocators?

Yes. However allocator_traits will do it for you. Let it. The std::lib has already debugged that code for you. C++11 containers will call allocator_traits<YourAllocator>::construct(your_allocator, pointer, args...). If your allocator implements these functions, allocator_traits will call your implementation, else it calls a debugged, efficient, default implementation.

5. What other changes are necessary to make the current code C++11 conformant?

To tell you the truth, this allocator isn't really C++03 or C++11 conformant. When you copy an allocator, the original and the copy are supposed to be equal to each other. In this design, that is never true. However this thing still just happens to work in many contexts.

If you want to make it strictly conforming, you need another level of indirection such that copies will point to the same buffer.

Aside from that, C++11 allocators are so much easier to build than C++98/03 allocators. Here's the minimum you must do:

template <class T>
class MyAllocator
{
public:
    typedef T value_type;

    MyAllocator() noexcept;  // only required if used
    MyAllocator(const MyAllocator&) noexcept;  // copies must be equal
    MyAllocator(MyAllocator&&) noexcept;  // not needed if copy ctor is good enough
    template <class U>
        MyAllocator(const MyAllocator<U>& u) noexcept;  // requires: *this == MyAllocator(u)

    value_type* allocate(std::size_t);
    void deallocate(value_type*, std::size_t) noexcept;
};

template <class T, class U>
bool operator==(const MyAllocator<T>&, const MyAllocator<U>&) noexcept;

template <class T, class U>
bool operator!=(const MyAllocator<T>&, const MyAllocator<U>&) noexcept;

You might optionally consider making MyAllocator Swappable and put the following nested type in the allocator:

typedef std::true_type propagate_on_container_swap;

There's a few other knobs like that you can tweak on C++11 allocators. But all of the knobs have reasonable defaults.

Update

Above I note that my stack allocator is not conforming due to the fact that copies are not equal. I've decided to update this allocator to a conforming C++11 allocator. The new allocator is called short_allocator and is documented here.

The short_allocator differs from the stack allocator in that the "internal" buffer is no longer internal to the allocator, but is now a separate "arena" object that can be located on the local stack, or given thread or static storage duration. The arena isn't thread safe though so watch out for that. You could make it thread safe if you wanted to, but that has diminishing returns (eventually you'll reinvent malloc).

This is conforming because copies of allocators all point to the same external arena. Note that the unit of N is now bytes, not number of T.

One could convert this C++11 allocator to a C++98/03 allocator by adding the C++98/03 boiler-plate (the typedefs, the construct member, the destroy member, etc.). A tedious, but straightforward task.

The answers to this question for the new short_allocator remain unchanged.

Your original code works on VC++ with the following modifications: disable warning 4100 (it thinks that `p` is unreferenced in `p->~T()`), and some reshuffling of pointer arithmetic: instead of `(pointer)&buf_ + N - ptr_ >= n`, I had to write `ptr_ + n <= (pointer)&buf_ + N` to avoid spurious signed/unsigned errors. — TemplateRex, Jul 25 '12 at 15:14
VC++ does not have ``, and I have manually written a few overloads for construct. I also wrote `pointer begin() { return reinterpret_cast &buf_; }` to emphasize the pointer conversion, and `pointer end() { return begin() + N; }` for convenience. — TemplateRex, Jul 25 '12 at 15:17
Perhaps an interesting performance stat: I use `stack_alloc` in a checkers/draughts game engine that generates around 1 million move lists per second, where a move is a 24-byte struct, and I use `N=32` for the allocator. The speed is more or less identical to the combo `std::allocator` and `vector.reserve(32)`, but without the hassle of having to write this everywhere (or of having to write a container wrapper that does this). Performance is within 12% of `std::array`, but without the safety concern. See this thread on a computer chess forum: http://talkchess.com/forum/viewtopic.php?t=39934 — TemplateRex, Jul 25 '12 at 15:24
@rhalbersma : That thread is really only interesting if one enjoys the ramblings of a C++ hater. :-P — ildjarn, Jul 25 '12 at 19:06
@HowardHinnant One more Microsoft specific wart: in Debug mode, the code compiles but at runtime it will hit asserts emitted by the iterator debugging macro `_BLOCK_TYPE_IS_VALID(pHead->nBlockUse)` in the `operator delete`. — TemplateRex, Jul 26 '12 at 08:17
@HowardHinnant, std::maxalign_t is not defined in all libraries. Instead of setting it to 16 I have found following suggestion howto compute it: http://t5721.codeinpro.us/q/515022e3e8432c042620e111. If memory space is really scarce, it is possible to do a per case calculation using bitwise operations and the alignof operator as in the following answer: http://stackoverflow.com/questions/8752546/how-does-malloc-understand-alignment/18479609#18479609 I wonder why placement new does not do it automatically. Maybe because unaligned mem is legal for some compilers and only results in slower code? — Patrick Fromberg, Oct 10 '13 at 20:36
@HowardHinnant, I just see that you only align the memory buffer but not the objects in the buffer (i.e. only the first object in the buffer is well aligned). I thought the objects themselves must be well aligned but the buffer does not matter. Reference: http://stackoverflow.com/questions/11781724/do-i-really-have-to-worry-about-alignment-when-using-placement-new-operator — Patrick Fromberg, Oct 10 '13 at 21:13
@HowardHinnant, the links to the allocator don't seem to function any longer. Has the code been uploaded somewhere else? — Fabio A., Nov 20 '14 at 17:51
@HowardHinnant, is there something like this in boost or the standard library? If not, are there plans for that? — Jon, Jan 30 '15 at 04:58
Boost has some allocators. I'm not familiar with them. Just search for "allocator" in the boost docs. Currently the only allocator in the standard is `std::allocator`. Here http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3525.pdf is a proposal to add more. — Howard Hinnant, Jan 30 '15 at 16:31
Isn't `pointer_in_buffer´ exhibiting undefined behavior if the pointer doesn't point into the buffer? I think you can't compare pointers to objects that are not part of the same array or struct with the relational operators `<,>,<=,>=`. It is explicitly UB in C, but in c++ standard, there just seems to be no mention of what should happen in this case. See also [this question](http://stackoverflow.com/questions/27766570/how-do-i-safely-and-sensibly-determine-whether-a-pointer-points-somewhere-into-a) — MikeMB, Sep 11 '15 at 07:31
@MikeMB: Yes. This is the sort of thing that is UB because 6 guys in a dark room said so a few decades ago. If they had said something else, it would be something else. The reason they said so is because at the time segmented architectures were a clear and present danger. My code above assumes a flat address space, and a compiler that won't arbitrarily change your code based on that outdated decision from decades ago (which is an increasingly dangerous assumption as compiler optimizers move ever further into the too-clever-by-half arena). — Howard Hinnant, Sep 11 '15 at 17:54
@MikeMB Using N4527, [expr.rel]/4 leaves unrelated pointer comparisons *unspecified* (see [defns.unspecified]), not *undefined* (see [defns.undefined]). The easy fix for `pointer_in_buffer` is to use `std::less_equal` for which it is specified that "For templates `greater`, `less`, `greater_equal`, and `less_equal`, the specializations for any pointer type yield a total order, even if the built-in operators <, >, <=, >= do not." (see [comparisons]/14). Looking at the libc++ and libstdc++ sources, there don't appear to be any specializations for pointer comparisons in ``, though. — TemplateRex, Sep 15 '15 at 06:17
@TemplateRex: The reason there aren't any specializations in libc++ and libstdc++ is because std::libs are allowed to write non-portable code so that the clients of the std::libs don't have to. And libc++ and libstdc++ have chosen not to target platforms where such specializations would have different syntax than the primary template. — Howard Hinnant, Sep 15 '15 at 14:16
@TemplateRex: Thank you very much, I've read [expr.rel] multiple times and always read over that passage. — MikeMB, Sep 15 '15 at 15:36
0 @HowardHinnant vector.reserve() with the related stack size must be called on the vector, before pushing something to the vector. Else, the "grow" (realloc) method of the vector, will consume the stack space unnecessarily. At least this is what I could observe with Visual Studio 2017 implementation of std::vector. — Philippe, May 01 '19 at 14:41
More up-to-date post on this allocator: https://stackoverflow.com/a/34211227/576911 — Howard Hinnant, May 01 '19 at 15:44

Questions about Hinnant's stack allocator

1 Answers1

Linked