1

I am writing a container that uses alloca internally to allocate data on the stack. Risks of using alloca aside, assume that I must use it for the domain I am in (it's partly a learning exercise around alloca and partly to investigate possible implementations of dynamically-sized stack-allocated containers).

According to the man page for alloca (emphasis mine) :

The alloca() function allocates size bytes of space in the stack frame of the caller. This temporary space is automatically freed when the function that called alloca() returns to its caller.

Using implementation-specific features, I have managed to force inlining in such a way that the callers stack is used for this function-level "scoping".

However, that means that the following code will allocate a huge amount of memory on the stack (compiler optimisations aside):

for(auto iteration : range(0, 10000)) {
    // the ctor parameter is the number of
    // instances of T to allocate on the stack,
    // it's not normally known at compile-time
    my_container<T> instance(32);
}

Without knowing the implementation details of this container, one might expect any memory it allocates to be free'd when instance goes out of scope. This is not the case and can result in a stack overflow / high memory usage for the duration of the enclosing function.

One approach that came to mind was to explicitly free the memory in the destructor. Short of reverse engineering the resulting assembly, I haven't found a way of doing that yet (also see this).

The only other approach I have thought of is to have a maximum size specified at compile-time, use that to allocate a fixed-size buffer, have the real size specified at runtime and use the fixed-size buffer internally. The issue with this is that it's potentially very wasteful (suppose your maximum were 256 bytes per container, but you only needed 32 most of the time).

Hence this question; I want to find a way to provide these scope semantics to the users of this container. Non-portable is fine, so long as it's reliable on the platform its targeting (for example, some documented compiler extension that only works for x86_64 is fine).

I appreciate this could be an XY problem, so let me restate my goals clearly:

  • I am writing a container that must always allocate its memory on the stack (to the best of my knowledge, this rules out C VLAs).
  • The size of the container is not known at compile-time.
  • I would like to maintain the semantics of the memory as if it were held by an std::unique_ptr inside of the container.
  • Whilst the container must have a C++ API, using compiler extensions from C is fine.
  • The code need only work on x86_64 for now.
  • The target operating system can be Linux-based or Windows, it doesn't need to work on both.
OMGtechy
  • 7,935
  • 8
  • 48
  • 83
  • 1
    The statement "container that must always allocate its memory on the stack" does not compute, insofar as C++ goes. The container itself may be allocated on the stack (automatic scope) or the heap (dynamic scope), which is controlled entirely by whatever instantiates the container. But the container itself has absolutely no influence on that, whatsoever. Perhaps you're asking how to declare a class that can only be declared in automatic scope. This cannot be done in C++. – Sam Varshavchik Mar 18 '18 at 02:07
  • You could write an allocator based on `alloca` instead of `sbrk` like you'd normally do with `malloc` – Passer By Mar 18 '18 at 02:08
  • I tried this. I was unable to allocate the memory in the constructor because the memory is freed when the constructor ends. The best I could do was call the array element's constructors in the container's constructor and call their destructors in the destructor. But the allocation itself had to happen before creating the container. To that end I created a MACRO which is a bit of a fudge. But it lets me call `alloca` outside the constructor. – Galik Mar 18 '18 at 02:13
  • 3
    Space allocated on the stack is freed when the function returns. Since that isn't what you want, why are you determined that you want to allocate space on the stack? – prl Mar 18 '18 at 02:13
  • 2
    @SamVarshavchik: The container could be allocated on a pile of £20 notes as far as C++ cares. – Lightness Races in Orbit Mar 18 '18 at 02:16
  • If you declare a large block of memory in a function higher up the call chain, but not `main()`, whose local variables have `static` duration now, you can pass the bottom of the stack and the current stack pointer down the chain and create your stack frames by subtracting from the stack pointer. However, at that point, you might as well not make it automatic. – Davislor Mar 18 '18 at 02:19
  • In some implementations, thread-local variables are created in the stack segment. – Davislor Mar 18 '18 at 02:20
  • 1
    @LightnessRacesinOrbit I like the sound of that – Passer By Mar 18 '18 at 02:20
  • Is your problem that you want your container to be resizeable? – Galik Mar 18 '18 at 02:28
  • Your concept of a CVLA is a bit off, see [C11 Standard - 6.2.4 Storage durations of objects (p7)](http://port70.net/~nsz/c/c11/n1570.html#6.2.4p7) and [What's the difference between a VLA and dynamic memory allocation via malloc?](https://stackoverflow.com/questions/31199566/whats-the-difference-between-a-vla-and-dynamic-memory-allocation-via-malloc) and [malloced array VS. variable-length-array](https://stackoverflow.com/questions/16672322/malloced-array-vs-variable-length-array) – David C. Rankin Mar 18 '18 at 03:52
  • @Galik once it's been created, the size can be fixed, but that size isn't known until runtime and it can't be heap allocated. – OMGtechy Mar 18 '18 at 09:36
  • @DavidC.Rankin thanks, I'll have a read :) – OMGtechy Mar 18 '18 at 09:36
  • @Galik I got around the scope issue by forcing the constructor to be inlined, which meant alloca was called inside the function using the object – OMGtechy Mar 18 '18 at 10:09

1 Answers1

3

I am writing a container that must always allocate its memory on the stack (to the best of my knowledge, this rules out C VLAs).

The normal implementation of C VLAs in most compilers is on the stack. Of course ISO C++ doesn't say anything about how automatic storage is implemented under the hood, but it's (nearly?) universal for C implementations on normal machines (that do have a call+data stack) to use that for all automatic storage including VLAs.

If your VLA is too large, you get a stack overflow rather than a fallback to malloc / free.

Neither C nor C++ specify alloca; it's only available on implementations that have a stack like "normal" machines, i.e. the same machines where you can expect VLAs to do what you want.

All of these conditions hold for all the major compilers on x86-64 (except that MSVC doesn't support VLAs).


If you have a C++ compiler that supports C99 VLAs (like GNU C++), smart compilers may reuse the same stack memory for a VLA with loop scope.


have a maximum size specified at compile-time, use that to allocate a fixed-size buffer ... wasteful

For a special case like you mention, you could maybe have a fixed-size buffer as part of the object (size as a template param), and use that if it's big enough. If not, dynamically allocate. Maybe use a pointer member to point to either the internal or external buffer, and a flag to remember whether to delete it or not in the destructor. (You need to avoid delete on an array that's part of the object, of course.)

// optionally static_assert (! (internalsize & (internalsize-1), "internalsize not a power of 2")
// if you do anything that's easier with a power of 2 size
template <type T, size_t internalsize>
class my_container {
    T *data;
    T internaldata[internalsize];
    unsigned used_size;
    int allocated_size;   // intended for small containers: use int instead of size_t
    // bool needs_delete;     // negative allocated size means internal
}

The allocated_size only needs to be checked when it grows, so I made it signed int so we can overload it instead of needing an extra boolean member.

Normally a container uses 3 pointers instead of pointer + 2 integers, but if you don't grow/shrink often then we save space (on x86-64 where int is 32 bits and pointers are 64-bit), and allow this overloading.

A container that grows large enough to need dynamic allocation should continue using that space but then shrinks should keep using the dynamic space, so it's cheaper to grow again, and to avoid copying back into the internal storage. Unless the caller uses a function to release unused excess storage, then copy back.

A move constructor should probably keep allocation as-is, but a copy constructor should copy into the internal buffer if possible instead of allocating new dynamic storage.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • Looks like my understanding of VLAs was off, as you and David C pointed out. I thought it'd fallback to heap allocation. I just ran some tests locally and on my system at least, it behaves as you described. Are there any guarantees of this? – OMGtechy Mar 18 '18 at 10:08
  • @OMGtechy: Guarantee that the memory can be reclaimed / reused as soon as it goes out of scope? No, I'd only expect reuse if it's the same VLA inside a loop. Compilers have to know how to do that at `-O0` to make sane code for loops containing declarations of regular array and scalar variables, and it's no different for VLAs. But if you had 2 different scopes containing 2 different VLAs (even if they had the same size) you might or might not find the compiler optimizes by reusing the space, and maybe only `-O2`. Even if it's both sides of an `if else`, so they can't both be used in one trip. – Peter Cordes Mar 18 '18 at 13:05
  • How could you use a VLA to implement a container though? The VLA will go out of scope when the constructor terminates. Or are there C++ extensions that allow VLAs as members or something like that? – BeeOnRope Mar 18 '18 at 18:20
  • I guess you could use a macro to expand your `my_container` definition into a VLA local helper object in the same scope and a call to the `my_container` constructor which takes a pointer to the VLA and uses it as storage. – BeeOnRope Mar 18 '18 at 18:30
  • @BeeOnRope: However the OP was doing it with `alloca`, which is also deallocated on leaving the constructor scope. (I think they mentioned macros in the question.) The first section of the answer is just to explain how VLAs *do* work, not how you can actually use them to solve this problem. Problems like that are why I wrote the 2nd part of the answer, suggesting a workaround that gives you normal C++ objects but avoids dynamic allocation for small sizes. (i.e. mostly solving the same problem without variable-size stack allocation). – Peter Cordes Mar 19 '18 at 01:55
  • Right, but the OP was able to get around that problem with `alloca` by force-inlining the constructor. That wouldn't seem to work in general for VLAs, which are have lifetime tied scopes, not functions. – BeeOnRope Mar 21 '18 at 04:17
  • @BeeOnRope: Ah, I hadn't realized that built-in `alloca` stayed allocated after inlining. Yeah, that would let you still use functions, not macros. I don't think you can return a VLA by value, so probably pure macros without functions would be the only way to go. – Peter Cordes Mar 21 '18 at 04:58