Given the choice, should I store my static buffer inline or on the heap?

Question

I have written a circular buffer with static capacity, therefore it is backed by a (constant size) array. It is owned by a singleton class allocated on the heap. My question is whether or not the backing array in the circular buffer should be inline, or allocated on the heap. The difference would be:

CircularBuffer
{
   std::array<uint64_t, N> buffer;
}

CircularBuffer
{
   uint64_t* buffer;
}

To be clear, allocation will happen once at startup, and everything will be on the heap, as the enclosing object will be heap allocated. Also, each portion of the buffer will be accessed an equal amount of times due to the nature of the data structure and its usage.

Theoretically, what would you consider in making this decision, if your goal is performance?

Of course I will make my final decision based on benchmarks. But as a first pass, the parameters I am considering are:

N: Size of buffer
R: What percentage of calls to the enclosing object result in reads/writes to the buffer

In practice, N = 1MB, R = 100% (all calls result in both a read and write to the buffer), and I am running single threaded on a high end cpu.

Are there other parameters you would consider?

`std::array buffer;` inside the object and stop worrying about it. — Retired Ninja, Jul 07 '23 at 00:19
Doesn't really matter where, once allocated heap, stack, other, the same speed. — Erik Eidt, Jul 07 '23 at 01:21
@ErikEidt: Making it part of the singleton means no extra indirection to access it, vs. if there was a pointer member in the singleton to a separate heap allocation. Also locality if you tend to access any members of the singleton at similar time to accessing any array elements, especially if the array is small. (dTLB locality if not cache line.) — Peter Cordes, Jul 07 '23 at 01:24

score 1 · Answer 1 · answered Jul 07 '23 at 00:38

For pure performance reasons there's no real theoretical rationale about which one will run faster so you should run benchmarks on the specific architecture you target, as you said. In the end your buffer should be paged in the same RAM, or in the same cache even, if your CPU has a lot of cache.

(See this question : Is accessing data in the heap faster than from the stack?.)

If you want safety, a raw pointer (or safer, a std::vector which you don't resize) could lead to more checks for boundaries at runtime, so less performance.

I don't know what is your application, but to me it looks like premature optimization (maybe it's not, but consider it could be), as I don't expect a lot of difference coming out of this. And usually cleaner code would be wiser to write when performance differences don't matter that much.

As for as other considerations go, eating up 1MB of the stack can be fine, but if you consider using more of the stack elsewhere it can be a problem as the stack is way more limited than the heap for large allocations (typically a few MB for the stack).

Accessing the data will cost the same, chasing down the pointer to find the data, that's another kettle of fish. — user4581301, Jul 07 '23 at 01:53

score 1 · Answer 2 · answered Jul 07 '23 at 01:36

Making it part of the singleton means no extra indirection to access it, vs. if there was a pointer member in the singleton pointing to a separate heap allocation.

Since you have a heap-allocated singleton (instead of a static or global variable), there's already a level of indirection to reach the class object itself with a pointer to it stored somewhere.

You could consider making the array a static member, which you could change if you ever want to have multiple circular buffers in the same application instead of a singleton. On x86-64 in a non-PIE Linux executable, indexing an array in static storage can be done with [disp32 + reg] which can be cheaper than a 2-register addressing mode ([reg + offsetof(Circularbuffer, arr) + reg*8]), at least on Intel CPUs where indexed addressing modes can un-laminate micro-fused uops, especially for AVX instructions. But that doesn't apply if you build a normal PIE executable or shared library. On other ISAs, generating the base address for a static array in a register takes extra instructions vs. just using the pointer to the class object that you're going to already need in a register.

Putting the array in the object gives locality if you tend to access any members of the singleton at similar time to accessing any array elements, especially if the array is small. (dTLB locality if not cache line, at least for elements near the start of the array.) Put the array member last so other member vars are close to each other.

It's also less code (compiler-generated assembly) to get it allocated; just one allocation. This benefit also applies to using static storage; it's already reserved in the BSS.

Given the choice, should I store my static buffer inline or on the heap?

2 Answers2