There are many great threads on how to align structs to the cache line (e.g., Aligning to cache line and knowing the cache line size).
Imagine you have a system with 256B cache line size, and a struct of size 17B (e.g., a tightly packed struct with two uint64_t
and one uint8_t
). If you align the struct to cache line size, you will have exactly one cache line load per struct instance.
For machines with a cache line size of 32B or maybe even 64B, this will be good for performance, because we avoid having to fetch 2 caches lines as we do definitely not cross CL boundaries.
However, on the 256B machine, this wastes lots of memory and results in unnecessary loads when iterating through an array/vector of this struct. In fact, you could store 15 instances of the struct in a single cacheline.
My question is two-fold:
- In C++17 and above, using
alignas
, I can align to cache line size. However, it is unclear to me how I can force alignment in a way that is similar to "put as many instances in a cache line as possible without crossing the cache line boundary, then start at the next cache line". So something like this:
where the upper box is a cache line and the other boxes are instances of our small struct.
- Do I actually want this? I cannot really wrap my head around this. Usually, we say if we align our struct to the cache line size, access will be faster, as we just have to load a single cache line. However, seeing my example, I wonder if this is actually true. Wouldn't it be faster to not be aligned, but instead store many more instances in a single cache line?
Thank you so much for your input here. It is much appreciated.