C/C++ performance of static arrays vs dynamic arrays

Question

When performance is essential to an application, should consideration be given whether to declare an array on the stack vs the heap? Allow me to outline why this question has come to mind.

Since arrays in C/C++ are not objects and decay to pointers, the compiler uses the provided index to perform pointer arithmetic to access elements. My understanding is that this procedure differs from a statically declared array to a dynamically declared array when going past the first dimension.

If I were to declare an array on the stack as follows;

  int array[2][3] = { 0, 1, 2, 3, 4, 5 }
  //In memory        { row1 } { row2 }

This array would be stored in Row Major format in memory since it is stored in a contiguous block of memory. This means when I try to access an element in the array, the compiler must perform some addition and multiplication in order to ascertain the correct location.

So if I were to do the following

  int x = array[1][2]; // x = 5

The compiler would then use this formula where:

i = row index j = column index n = size of a single row (here n = 2)
array = pointer to first element

  *(array + (i*n) + j)
  *(array + (1*2) + 2)

This means if I were to loop over this array to access each of its elements, an additional multiplication step is performed for each access by index.

Now, in an array declared on the heap, the paradigm is different and requires a multi stage solution. Note: I could also use the C++ new operator here, but I believe there is no difference in how the data is represented.

  int ** array;
  int rowSize = 2;
  // Create a 2 by 3 2d array on the heap
  array = malloc(2 * sizeof(int*));
  for (int i = 0; i < 2; i++) {
      array[i] = malloc(3 * sizeof(int));
  }

  // Populating the array
  int number = 0;
  for (int i = 0; i < 2; i++) {
      for (int j = 0l j < 3; j++) {
          array[i][j] = number++;
      }
  }

Since the array is now dynamic, its representation is a one dimensional array of one dimensional arrays. I will try to draw an ascii picture...

              int *        int int int
int ** array-> [0]          0   1   2
               [1]          3   4   5

This would imply that multiplication is no longer involved right? If I were to do the following

int x = array[1][1];

This would then perform indirection/pointer arithmetic on array[1] to access a pointer to the second row and then perform this once again to access the second element. Am I correct in saying this?

Now that there is some context, back to the question. If I am writing code for an application that requires crisp performance, like a game which has around 0.016 seconds to render a frame, should I think twice about using an array on the stack vs the heap? Now I realize there is a one time cost for using malloc or the new operator, but at a certain point (just like Big O analysis) when the data set becomes large, would one be better off iterating through a dynamic array to avoid row major indexing?

You could use [continue memory allocation for 2D](http://stackoverflow.com/a/15062765/1673391) for comparison — Grijesh Chauhan, Jul 21 '13 at 17:44
Whatever you do, you're still doing row major (rather than column major). Try measuring it. — doctorlove, Jul 21 '13 at 17:45
Stack vs. heap is a matter of how large the allocation is and when you know how large it is, not a matter of data layout. As Grijesh Chauhan already said, you can use the same layout as multi-dimensional static array on the heap, you just don't get syntactic sugar for it. You can also have a static array of pointers to arrays (and it sometimes makes sense, though the pointed-too arrays will often be allocated dynamically). — , Jul 21 '13 at 17:56
Another thing, your performance concerns seem misguided to me: The array-of-pointers-to-arrays may save an integer multiplication, but that is one of the cheaper things your CPU can do and in return you need an additional memory reference - which is hardly cheaper than a multiplication **even if** it hits L1 cache. Regardless of which will be faster, chances are it won't be among the bottlenecks. If it is, your profiler will tell you. It can be useful to think twice about low-level optimizations, but this is not something you should proactively worry about even in performance-critical code. — , Jul 21 '13 at 18:01
If you're just iterating over the array, a standard optimisation technique called 'strength reduction' allows the multiplications to be converted to additions. That's been around for 40 years or so. Although multiplication is less of a major concern these days; cache misses are much worse. — Alan Stokes, Jul 21 '13 at 18:17
Another interesting question would be the performance of a static array implemented on the stack vs. a static array implemented in a data section. — Z boson, Oct 17 '14 at 21:07

score 32 · Accepted Answer · edited May 23 '17 at 11:54

These will apply to "plain" C (not C++).

First let's clear some terminology

"static" is a keyword in C which will drastically change the way your variable is allocated / accessed if it is applied on variables declared within functions.

There are 3 places (regarding C) where a variable (including arrays) may sit:

Stack: these are function local variables without static.
Data section: space is allocated for these when the program starts. These are any global variables (be it static or not, there the keyword relates to visibility), and any function local variables declared static.
Heap: dynamically allocated memory (malloc() & free()) referred by a pointer. You access this data only through pointers.

Now let's see how one dimensional arrays are accessed

If you access an array with a constant index (may be #defined, but not const in plain C), this index can be calculated by the compiler. If you have a true array in the Data section, it will be accessed without any indirection. If you have a pointer (Heap) or an array on the Stack, an indirection is always necessary. So arrays in the Data section with this type of access may be a very little bit faster. But this is not a very useful thing which would turn the world.

If you access an array with an index variable, it essentially always decays to a pointer since the index may change (for example increment in a for loop). The generated code will likely be very similar or even identical for all types here.

Bring in more dimensions

If you declare a two or more dimensional array, and access it partially or fully by constants, an intelligent compiler may well optimize these constants out as above.

If you access by indices, note that the memory is linear. If the later dimensions of a true array are not a multiple of 2, the compiler will need to generate multiplications. For example in the array int arr[4][12]; the second dimension is 12. If you now access it as arr[i][j] where i and j are index variables, the linear memory has to be indexed as 12 * i + j. So the compiler has to generate code to multiply with a constant here. The complexity depends on how "far" the constant is from a power of 2. Here the resulting code will likely look somewhat like calculating (i<<3) + (i<<2) + j to access the element in the array.

If you build up the two dimensional "array" from pointers, the size of the dimensions do not matter since there are reference pointers in your structure. Here if you can write arr[i][j], that implies you declared it as for example int* arr[4], and then malloc()ed four chunks of memory of 12 ints each into it. Note that your four pointers (which the compiler now can use as base) also consume memory which wasn't taken if it was a true array. Also note that here the generated code will contain a double indirection: First the code loads a pointer by i from arr, then it will load an int from that pointer by j.

If the lengths are "far" from powers of 2 (so complex "multiply with constant" codes would have to be generated to access the elements) then using pointers may generate faster access codes.

As James Kanze mentioned in his answer, in some circumstances the compiler may be able to optimize access for true multi-dimensional arrays. This kind of optimization is impossible for arrays composed from pointers as the "array" is actually not a linear chunk of memory that case.

Locality matters

If you are developing for usual desktop / mobile architectures (Intel / ARM 32 / 64 bit processors) locality also matters. That is what is likely sitting in the cache. If your variables are already in the cache for some reason, they will be accessed faster.

In the term of locality Stack is always the winner since the Stack is so frequently used that it is very likely to always sit in the cache. So small arrays are best put in there.

Using true multi-dimensional arrays instead of composing one from pointers may also help on this ground since a true array is always a linear chunk of memory, so it usually might need fewer blocks of cache to load in. A scattered pointer composition (that is if using separately malloc()ed chunks) to the contrary might need more cache blocks, and may rise cache line conflicts depending on how the chunks physically ended up on the heap.

Many great answers to my question, but this one made the most sense to me. Wish I could accept more than one. Thanks everyone! Learned quite a bit from this discussion — Paul Renton, Jul 23 '13 at 00:32
I think there is at least one more type of variable type in the data section and that is thread local storage (e.g. with `__thread int x`) — Z boson, Oct 17 '14 at 11:50
@RestlessC0bra The size of the sections depend on architecture and compiler, if you are really into it, you would most likely find them in a linker script. Usually you wouldn't have to care a lot about it as data's allocation is determined compile time (to fit all your global variables), and the remaining area on a microcontroller is used for stack and heap (on a desktop you typically never need to touch them). So most often what you get by default should be right. — Jubatian, May 19 '17 at 12:02

jxh · Answer 2 · 2013-07-21T19:34:51.230

As to which choice provides better performance, then the answer will largely depend on your specific circumstances. The only way to know if one way is better or if they are roughly equivalent is to measure the performance of your application.

Some things that would be a factor are: how often you do it, the actual size of the arrays/data, how much memory your system has, and how well your system manages memory.

If you have the luxury of being able to choose between the two choices, it must mean the sizes are already nailed up. Then, you do not need the multiple allocation scheme that you illustrated. You can perform a single dynamic allocation of your 2D array. In C:

int (*array)[COLUMNS];
array = malloc(ROWS * sizeof(*array));

In C++:

std::vector<std::array<int, COLUMNS>> array(ROWS);

As long as the COLUMNS is nailed down, you can perform a single allocation to obtain your 2D array. If neither are nailed down, then you don't really have the choice of using a static array anyway.

The C and C++ solutions you propose are radically different in terms of memory allocation. In general, for this sort of problem in C++, you define a `Matrix` class, which can use either `std::vector>` or `std::vector` (and multiplication in the index operators), depending on which is faster on your machine. (Typically, if my experience is anything to go by, it is the latter.) — James Kanze, Jul 21 '13 at 18:06
On further regard, they aren't as different as I though. I was thinking of `std::vector>`. — James Kanze, Jul 21 '13 at 18:14

score 4 · Answer 3 · answered Jul 21 '13 at 18:02

The usual way of implementing a 2 dimensional array in C++ would be to wrap it in a class, using std::vector<int>, and have class accessors which calculate the index. However:

Any questions concerning optimization can only be answered by measuring, and even then, they are only valid for the compiler you are using, on the machine on which you do the measurements.

If you write:

int array[2][3] = { ... };

and then something like:

for ( int i = 0; i != 2; ++ i ) {
    for ( int j = 0; j != 3; ++ j ) {
        //  do something with array[i][j]...
    }
}

It's hard to imagine a compiler which doesn't actually generate something along the lines of:

for ( int* p = array, p != array + whatever; ++ p ) {
    //  do something with *p
}

This is one of the most fundamental optimizations around, and has been for at least 30 years.

If you dynamically allocate as you propose, the compiler will not be able to apply this optimization. And even for a single access: the matrix has poorer locality, and requires more memory accesses, so would likely be less performant.

If you're in C++, you would normally write a Matrix class, using std::vector<int> for the memory, and calculating the indexes explicitly using multiplication. (The improved locality will probably result in better performance, despite the multiplication.) This could make it more difficult for the compiler to do the above optimization, but if this turns out to be an issue, you can always provide specialized iterators for handling this one particular case. You end up with more readable and more flexible code (e.g. the dimensions don't have to be constant), at little or no loss of performance.

Is there a name for that fundamental optimization? Maybe I could check if my compiler performs this optimization — Paul Renton, Jul 21 '13 at 18:33
The for loop optimization is viable, but in many cases it might not be possible to perform. For example if the indices are used within the code, or probably even the loops themselves may have additional early termination points. Usually a multi dim array is used since it is cleaner to express some algorithm by it which algorithm will obviously use the indices somehow. Well, I am not much into compilers (I usually work with 8bit embedded cpu's which have simpler compilers), maybe it is possible to more extent than I believe. — Jubatian, Jul 21 '13 at 19:28
@PaulRenton I don't know about a name, but it was a common optimization back in the 1970's, so I'd be surprised that any compiler today didn't do it. — James Kanze, Jul 21 '13 at 20:46

sgun · Answer 4 · 2013-07-21T18:06:46.067

1

Often there is a trade off between memory consumption and speed. Empirically, I have witnessed that creating array on stack is faster than allocation on heap. As the array size increases this becomes more apparent.

You can always decrease the memory consumption. For example you can use short or char instead of int etc.

As the array size increases, especially with the use of realloc, there might be a lot more page replacement (up and down) to maintain the contiguous location of items.

You should also consider that there is a lower limit for the size of the things you can store in stack, for heap this limit is higher but as I told with the cost of performance.

edited Jul 21 '13 at 18:06

answered Jul 21 '13 at 17:57

sgun

899
6
12

1

On the other hand, the maximum size you can allocate on the stack is often very limited; if the array is large enough for performance to be an issue, you may not have a choice. – James Kanze Jul 21 '13 at 18:04
1

There's not necessarily any difference in performance. It depends on what you're doing, and how the compiler handles it. – James Kanze Jul 21 '13 at 18:07
you are right about stack limit for arrays , James. I was editing my answer accordingly. – sgun Jul 21 '13 at 18:08

score -4 · Answer 5 · answered Jul 22 '13 at 00:56

-4

Stalk memory allocation offers quicker access of data than the Heap. The CPU would look for the address in the cache if it does not have it, if it does not find the address in the cache then it would look up in the main memory. Stalk is a preferred location after cache.

answered Jul 22 '13 at 00:56

Juniar

1,269
1
15
24

3

What is Stalk memory allocation? I'm assuming you meant stack? – Paul Renton Jul 22 '13 at 01:47

C/C++ performance of static arrays vs dynamic arrays

5 Answers5

Linked