Is using alloca() for variable length arrays better than using a vector on the heap?

Question

I have some code using a variable length array (VLA), which compiles fine in gcc and clang, but does not work with MSVC 2015.

class Test {
public:
    Test() {
        P = 5;
    }
    void somemethod() {
        int array[P];
        // do something with the array
    }
private:
    int P;
}

There seem to be two solutions in the code:

using alloca(), taking the risks of alloca in account by making absolutely sure not to access elements outside of the array.
using a vector member variable (assuming that the overhead between vector and c array is not the limiting factor as long as P is constant after construction of the object)

The ector would be more portable (less #ifdef testing which compiler is used), but I suspect alloca() to be faster.

The vector implementation would look like this:

class Test {
public:
    Test() {
        P = 5;
        init();
    }
    void init() {
        array.resize(P);
    }
    void somemethod() {
        // do something with the array
    }
private:
    int P;
    vector<int> array;
}

Another consideration: when I only change P outside of the function, is having a array on the heap which isn't reallocated even faster than having a VLA on the stack?

Maximum P will be about 400.

C++ doesn't have variable length arrays. GCC and Clang only offer it as an extension. So don't hold your breath on portability. — StoryTeller - Unslander Monica, Sep 28 '17 at 09:04
Although `alloca` is a *de facto* standard it's not portable, because implementations differ in how failure is reported, or whether it is. Also you don't want to eat up machine stack. Use `std::vector`. — Cheers and hth. - Alf, Sep 28 '17 at 09:05
@Cheersandhth.-Alf isn't `std::array` more appropriate here? It's the closest thing to VLAs. — Jabberwocky, Sep 28 '17 at 09:09
The problem is, that I probably want as much performance as possible. While the array size is usually about ``15`` the function using VLA is called millions of times. Maybe ``std::array``, the important thing is fast element access. The array should probably be static, because creating a new object each time will slow down the function a lot more than creating the VLA. — allo, Sep 28 '17 at 09:10
@MichaelWalz no, std::array is not close to a VLA. It us close to a plain old valilla array. — n. m. could be an AI, Sep 28 '17 at 09:11
@MichaelWalz: The OP's example is not real code so it's hard to say. But offhand, I'd say that the original choice of *variable length* array sort of indicates variable length. Still, let's not assume. ;-) — Cheers and hth. - Alf, Sep 28 '17 at 09:12
@allo if the array size is small you could have a fixed size array with the maximum possible size. This will certainly waste some stack space, but for a size of less the 100 or even 1000, this is not an issue. This is the fastest you can get. — Jabberwocky, Sep 28 '17 at 09:12
@n.m, nwp: I think it has to be known at object creation time? I have an initialization method, which currently resizes a static vector one time, which is then used instead of the VLA in the method. — allo, Sep 28 '17 at 09:13
*" I probably want as much performance as possible"* - Did you profile both solutions? You'd be surprised by what assumptions don't hold in your assessment. — StoryTeller - Unslander Monica, Sep 28 '17 at 09:13
also using alloca() might be dangerous once you cross certain size, you will find this in alloca's man page... "The alloca() function returns a pointer to the beginning of the allocated space. If the allocation causes stack overflow, program behavior is undefined." — spt025, Sep 28 '17 at 09:16
feel free to use it if you are not going to overflow stack.. less hassle as well for cleanup. — spt025, Sep 28 '17 at 09:17
@StoryTeller I converted the VLA to the vector solution to get it to compile. I cannot compare the exact code to gcc/clang as it is in a project which would need much work on legacy code to compile with other compilers. The point is, this solution seems to be no exact equivalent to me (i.e. stack vs. heap), so I ask myself if I should try using alloca() of if its not worth the efford. — allo, Sep 28 '17 at 09:19
@allo - "Stack vs. Heap" is a false concern. It really is. There are plenty of fast "heap" allocator designs. — StoryTeller - Unslander Monica, Sep 28 '17 at 09:20
@allo https://stackoverflow.com/questions/1018853/why-is-the-use-of-alloca-not-considered-good-practice you might find this useful for read. — spt025, Sep 28 '17 at 09:21
Thank you. I came from this an a few other questions to what actually is a VLA, when I researched why MSVC does not support this. I think I can manage to make sure not to write outside of the range. The methods are not too complex. — allo, Sep 28 '17 at 09:28
@allo: **did you actually benchmark your code?** How? With what timing results? That should go in your question! — Basile Starynkevitch, Sep 28 '17 at 09:38
Voted to close as not providing real code, or clear description of the real code. — Cheers and hth. - Alf, Sep 28 '17 at 09:41
Use std::vector. Test and see if the performance is inadequate. **If it is**, profile and see if std::vector is a bottleneck. **If it is**, replace it with a custom solution. — n. m. could be an AI, Sep 28 '17 at 09:48

Basile Starynkevitch · Accepted Answer · 2017-09-28T09:53:53.220

3

You could and probably should use some dynamically allocated heap memory, such as managed by a std::vector (as answered by Peter). You could use smart pointers, or plain raw pointers (new, malloc,....) that you should not forget to release (delete,free,....). Notice that heap allocation is probably faster than what you believe (practically, much less than a microsecond on current laptops most of the time).

Sometimes you can move the allocation out of some inner loop, or grow it only occasionally (so for a realloc-like thing, better use unsigned newsize=5*oldsize/4+10; than unsigned newsize=oldsize+1; i.e. have some geometrical growth). If you can't use vectors, be sure to keep separate allocated size and used lengths (as std::vector does internally).

Another strategy would be to special case small sizes vs bigger ones. e.g. for an array less than 30 elements, use the call stack; for bigger ones, use the heap.

If you insist on allocating (using VLAs -they are a commonly available extension of standard C++11- or alloca) on the call stack, be wise to limit your call frame to a few kilobytes. The total call stack is limited (e.g. often to about a megabyte or a few of them on many laptops) to some implementation specific limit. In some OSes you can raise that limit (see also setrlimit(2) on Linux)

Be sure to benchmark before hand-tuning your code. Don't forget to enable compiler optimization (e.g. g++ -O2 -Wall with GCC) before benchmarking. Remember that caches misses are generally much more expensive than heap allocation. Don't forget that developer's time also has some cost (which often is comparable to cumulated hardware costs).

Notice that using static variable or data has also issues (it is not reentrant, not thread safe, not async-signal-safe -see signal-safety(7) ....) and is less readable and less robust.

edited Sep 28 '17 at 09:53

answered Sep 28 '17 at 09:22

Basile Starynkevitch

223,805
18
296
547

My first fix was to use malloc/free in the method, but this of course is slow. So either a static variable (and then probably a STL vector is fast enough) or something like alloca. A third option would be to use an array with size MAX_P. – allo Sep 28 '17 at 09:25
*The total call stack is limited (e.g. to about a megabyte or a few of them).* That's highly implementation-dependent - both OS and application. For example, [a 32-bit Linux process](https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/5/html/Tuning_and_Optimizing_Red_Hat_Enterprise_Linux_for_Oracle_9i_and_10g_Databases/sect-Oracle_9i_and_10g_Tuning_Guide-Growing_the_Oracle_SGA_to_2.7_GB_in_x86_Red_Hat_Enterprise_Linux_2.1_Without_VLM-Linux_Memory_Layout.html) has up to 1 GB for the heap, but can have up to 2 GB for the stack. – Andrew Henle Sep 28 '17 at 09:32
Yes, I know that. – Basile Starynkevitch Sep 28 '17 at 09:34
I know you know. But most readers won't. Way too many are probably stuck on a "The stack is always small, the heap is always much bigger" fallacy. Using huge pre-allocated stacks and VLAs/`alloca()` is one way to get better performance from multithreaded memory-intensive applications where memory requirements are known. – Andrew Henle Sep 28 '17 at 09:39
@AndrewHenle: most of the time stack is smaller than heap. – Basile Starynkevitch Sep 28 '17 at 09:44
1

@BasileStarynkevitch *most of the time stack is smaller than heap.* Are you referring to actual "normal" usage, or the limits? For 64-bit processes in most such cases, the actual *de facto* **limit** for both the heap and the stack is the same: the amount of virtual memory available to the process. And as I linked earlier, for 32-bit Linux processes the stack can possibly be twice as large as the heap. – Andrew Henle Sep 28 '17 at 10:21

score 3 · Answer 2 · edited Sep 28 '17 at 12:28

3

First of all, you're getting lucky if your code compiles with ANY C++ compiler as is. VLAs are not standard C++. Some compilers support them as an extension.

Using alloca() is also not standard, so is not guaranteed to work reliably (or even at all) when using different compilers.

Using a static vector is inadvisable in many cases. In your case, it gives behaviour that is potentially not equivalent to the original code.

A third option you may wish to consider is

 // in definition of class Test
void somemethod()
{
    std::vector<int> array(P);      // assume preceding #include <vector>
    // do something with array
}

A vector is essentially a dynamically allocated array, but will be cleaned up properly in the above when the function returns.

The above is standard C++. Unless you perform rigorous testing and profiling that provides evidence of a performance concern this should be sufficient.

edited Sep 28 '17 at 12:28

spt025

2,134
2
20
26

answered Sep 28 '17 at 09:32

Peter

35,646
4
32
74

This solution would be like ``malloc``/``free`` be too slow to do in each call. Can you elaborate on the static vector not being equivalent? – allo Sep 28 '17 at 09:35
@allo static means there is only one copy of it for the whole program, so if you have two instances of your object existing at once it will not behave correctly – M.M Sep 28 '17 at 09:41
Ah, yes. This is currently no problem, but would be a factor for more reusability when another solution is fast enough. – allo Sep 28 '17 at 09:43
1

Using vector is not really equivalent to using `malloc()` and `free()`. In any event your assumption that you need to avoid dynamic memory allocation is flawed. Unless you have EVIDENCE through testing/profiling then all you're doing is premature optimisation. And, depending on your compiler and host system, quite possibly degrading performance by making such an assumption. – Peter Sep 28 '17 at 09:51
1

@allo But if there is only one for the whole program then you lose nothing by making it a non-static class member – M.M Sep 28 '17 at 10:04
Indeed. My actual structure is a bit different (not everything is in a class), but for the point here we can use a class member. I think the performance questions for global static vs. class member are similiar, aren't they? – allo Sep 28 '17 at 10:26
1

@allo - maybe, maybe not. You seem to be trying to make blanket assumptions about what gives or does not give optimal performance (static, class member, dynamic memory allocation, etc). No such blanket statement is possible with modern systems, hence the need to test/profile. Modern compilers and CPUs can and do break plenty of assumptions that mere programmers might make. – Peter Sep 28 '17 at 10:31

score 1 · Answer 3 · answered Sep 28 '17 at 10:06

Why don't you make the array a private member?

#include <vector>

class Test
{
public:
    Test()
    {
        data_.resize(5);
    }
    void somemethod()
    {
        // do something with data_
    }
private:
    std::vector<int> data_;
}

As you've specified a likely maximum size of the array, you could also look at something like boost::small_vector, which could be used like:

#include <boost/container/small_vector.hpp>

class Test
{
public:
    Test()
    {
        data_.resize(5);
    }
    void somemethod()
    {
        // do something with data_
    }
private:
    using boc = boost::container;

    constexpr std::size_t preset_capacity_ = 400;
    boc::small_vector<int, preset_capacity_> data_;
}

You should profile to see if this is actually better, and be aware this will likely use more memory, which could be an issue if there are many Test instances.

Is using alloca() for variable length arrays better than using a vector on the heap?

3 Answers3

Linked