The quote you're asking about simply isn't true in general. This was also pointed out in a very popular comment in the post:
Actually, there is no assumption that "cacheline_pad_t will itself be aligned to a 64 byte;" alignment is actually not required. The padding just guarantees the only goal, namely that the variables before and after are in different cache lines.
In other words, the focus of the optimization is purely about cache line separation for variables declared between the padding. Nothing in the compiler/linker/runtime is going to guarantee that buffer_
and buffer_mask_
are given addresses within a single line.
That said, in practice, they probably will be. At least in the snippet shown, the first three variables declared are: pad0_
(64 bytes), buffer_
(pointer size, probably 8 bytes, arch dependent), and buffer_mask_
(probably 8 bytes, arch dependent). All three of these are going to be placed in the .bss
section.
When this module is loaded into memory on pretty much any common runtime system, the .bss
section will be placed on a page boundary, because that's the standard unit of memory in an MMU-managed system. Page sizes are usually 4k minimum, cache lines are typically (but not required to be) 64 bytes, so aligning to a page, means you are also aligned to a cache line. So then pad0_
will be page aligned, it will occupy one full cache line, which means buffer_
starts a new cache line, and only occupies 8 bytes of it, which puts buffer_mask_
immediately after it in the same line.
I'll reemphasize here: none of this is guaranteed to happen, it's just likely based on the most common toolchain + runtime behavior.
To your second questions:
In my opinion, When two members are in the beginning, either attribute or memalign() can help, but what about if these two members are in the middle of a struct?
My best suggestion here would be to take advantage of the C/C++ guarantees on struct standard layout, meaning fields you declare in a struct are guaranteed to be placed in the order that you declared them. Thus if you want fields in the middle of a struct to be aligned in a particular way, you need to pad them manually to force alignment. Just be aware that different architectures do have different cache line sizes, so in general this will not be portable. For example:
// not cache optimized:
struct CacheNonOpt {
uint32_t someArray1[12]; // 48 bytes
uint32_t someArray2[12]; // 48 bytes
} myNonOptStruct;
// probably cache optimized, assuming you align it:
struct CacheNonOpt {
uint32_t someArray1[12]; // 48 bytes
uint32_t pad[4]; // 16 bytes
uint32_t someArray2[12]; // 48 bytes
} myOptStruct;