Fixed size array vs alloca (or VLAs)

Question

When is alloca() preferable to memory allocated on the stack by declaring a fixed size array?

Details:

As we know, alloca() is a controversial function. Used recklessly, it can cause stack overflow. Used judiciously, it can shave a few nanoseconds from a tight loop by avoiding heap allocation. In this question about why alloca is considered bad, several of the top answers advocate for the occasional use of alloca.

Another way to allocate from the stack is to simply declare a fixed size array. An example of this strategy can be found in the arena class in Howard Hinnant's stack allocator. (That code is of course C++ but the concept is still applicable to C.)

What are the tradeoffs of using alloca vs a fixed size array? When, if ever, is one clearly preferable to the other? Is it simply a question of performance that should be empirically tested in each individual situation (when performance is a key goal and a hotspot has already been identified)? The fixed size array is more pessimistic -- it always allocates as much as we're willing to allocate on the stack -- but it's not clear whether this is good or bad.

Just to be as clear as possible, here's a very simple example of two function implementations where it seems reason to use either alloca or a fixed size array:

#include <alloca.h>
#include <assert.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>

void foo_alloca(const size_t mem_needed) {
    printf("foo_alloca(%zu)\n", mem_needed);
    char* mem;
    bool used_malloc = false;
    if (mem_needed <= 100)
        mem = alloca(mem_needed);
    else {
        mem = malloc(mem_needed);
        used_malloc = true;
    }
    assert(mem_needed != 0);
    // imagine we do something interesting with mem here
    mem[0] = 'a';
    mem[1] = 'b';
    mem[2] = 'c';
    mem[3] = '\0';
    puts(mem);
    if (used_malloc)
        free(mem);
}

void foo_fixed(const size_t mem_needed) {
    printf("foo_fixed(%zu)\n", mem_needed);
    char* mem;
    char stack_mem[100];
    bool used_malloc = false;
    if (mem_needed <= 100)
        mem = stack_mem;
    else {
        mem = malloc(mem_needed);
        used_malloc = true;
    }
    assert(mem_needed != 0);
    // imagine we do something interesting with mem here
    mem[0] = 'a';
    mem[1] = 'b';
    mem[2] = 'c';
    mem[3] = '\0';
    puts(mem);
    if (used_malloc)
        free(mem);
}

int main()
{
    foo_alloca(30);
    foo_fixed(30);
    foo_alloca(120);
    foo_fixed(120);
}

Another option very similar to alloca is VLAs. As far as I know, memory obtained from alloca and VLAs have essentially the same behavior so the question applies to VLAs as well. If that understanding is wrong just mention it.

This is `C` code. 1) The `malloc` call is not casted -- that doesn't work with C++, and 2) There are no VLA's in the C++ language. — PaulMcKenzie, Oct 03 '16 at 20:29
@MadPhysicist I should have been more creative. It's not a great example. Imagine that `mem_needed` actually describes the needed memory and we do more than write "abc" to it. — Praxeolitic, Oct 03 '16 at 20:33
@PaulMcKenzie I think this is relevant to both C and C++ and a lot of people with good insights might not look at this question if it's not tagged C++. It is true that there isn't a way to make the example code both idiomatic C and C++ because of malloc and casting but I don't think that means C++ programmers will have no interest. — Praxeolitic, Oct 03 '16 at 20:38
If the function gets called recursively, a small over-allocation can quickly grow into a huge over-allocation. — Riley, Oct 03 '16 at 20:39
@Riley That sounds like a valid argument for `alloca`. My one reservation is that I'm not sure if `alloca` rounds sizes up or not. That said, recursion + `alloca` would be questionable but it's still a good argument. — Praxeolitic, Oct 03 '16 at 20:42
I'm going mostly off of assumptions here so don't quote me. I can't think of any reason that it would allocate anything more than the exact amount requested. `malloc` has to take into account managing memory in a way that it can free an reallocate memory efficiently. On the stack it can just move the stack pointer back however far it needs to, and be done with it. — Riley, Oct 03 '16 at 20:48
@Riley Yep, I'm pretty sure what you've said is the primary benefit of `alloca` and the primary drawback is that since the stack size becomes variable generated code might be less efficient. (Always nice to have someone authoritative chime in though.) — Praxeolitic, Oct 03 '16 at 20:51
`alloca` would definitely be slower. After all of the normal stack allocation has happened you then have to make a system call which is expensive. — Riley, Oct 03 '16 at 20:55
@Riley I suspect `alloca` usually doesn't need to enter kernel mode. If it does, it probably only needs to extend the stack space which wouldn't happen on every call. I have no idea how to actually determine whether a glibc function enters kernel mode though. — Praxeolitic, Oct 03 '16 at 21:06
After a system call, does the same process always get put back in the CPU, or does it just go to the waiting state at which point the scheduler probably picks something else? — Riley, Oct 03 '16 at 21:12
@Riley My key point was that a library call (e.g. `alloca` or anything else in glibc) doesn't necessarily make a system call internally (which, like you point out, is expensive because it incurs context switching). A good example is `strlen` which can clearly be implemented in userspace. That said, I don't know exactly how a system call affects scheduling. — Praxeolitic, Oct 03 '16 at 21:17
Oops, I wasn't thinking. I just assumed that it would be a system call because most other memory management functions are, so it probably doesn't actually make a system call. — Riley, Oct 03 '16 at 21:33
After compiling and running a simple test with `strace`, it seems that `alloca` does not make a system call. Therefore, it shouldn't be much slower than a fixed array. `alloca` doesn't give any warning when you run out of memory, it's just UB [see here](http://stackoverflow.com/a/1018865/6697083) — Riley, Oct 03 '16 at 21:45
@Praxeolitic I just stumbled across your comments, and wanted to clarify a couple of things. 1) a system call that needs to enter kernel mode will perform a *mode* change. a *context* switch is when the cpu *contexts* are switched, which happens when a new thread is run. system calls are evaluated in the context of the running thread, so no context switch is needed. 2) malloc will usually not involve a mode change to kernel mode, unless a new *page* of memory (usially 4k) is needed. but that will also happen if the stack grows beyond the currently allocated pages (page fault -> mode change) — Andreas Grapentin, Apr 09 '18 at 11:50

score 4 · Accepted Answer · answered Oct 03 '16 at 22:14

4

What are the trade-offs of using alloca() vs a fixed size array?

Portability. alloca() is not a standard C library function. Fixed size arrays are part of the language.
Analyzability. Tools that analyze code memory usage regularly support stack depth analysis via fixed side arrays. alloc() analyzability may/may not exist.
Space efficiency. alloca() allocates the proscribed memory space. Fixed size arrays tend to over allocate.
Code efficiency/speed is certainly an implementation issue and profiling would be needed to compare performance. A significant difference is not expected.
VLA pros/cons is like alloca() except that is is part of the C99 standard, yet only optional in C11.

answered Oct 03 '16 at 22:14

chux - Reinstate Monica

143,097
13
135
256

Could you give an example of a tool that analyzes stack depth? I'm not familiar with this sort of analysis. Thanks. – Praxeolitic Oct 03 '16 at 22:52
@Praxeolitic 1st one that comes to mind: [CCS](http://www.ccsinfo.com/content.php?page=compilers) analyses stack/memory usage. By not allowing recursion, the absolute maximum stack depth/ memory usage is calculable and is very important with compilers that work in restricted embedded memory environments. – chux - Reinstate Monica Oct 04 '16 at 01:45

Fixed size array vs alloca (or VLAs)

1 Answers1