I'm trying to implement custom allocator to work with std containers based on the requirements here: https://en.cppreference.com/w/cpp/named_req/Allocator
I'm currently trying to implement a linear allocator and I'm having hard time with memory alignment.
After I allocate a block of memory I'm wondering how much padding do I need between each object in the block to optimize cpu read/writes.
I'm not sure if the address alignment should be divisible
- by the cpu word size (4 bytes on 32 bits machine and 8 bytes on 64 bits machine)
- by the
sizeof(T)
- by the
alignof(T)
I read different answers different places.
For example in this question the accepted answers says:
The usual rule of thumb (straight from Intels and AMD's optimization manuals) is that every data type should be aligned by its own size. An int32 should be aligned on a 32-bit boundary, an int64 on a 64-bit boundary, and so on. A char will fit just fine anywhere.
So by that answer it looks like the address alignment should be divisible by sizeof(T)
.
On this question the second answer state that:
The CPU always reads at its word size (4 bytes on a 32-bit processor), so when you do an unaligned address access — on a processor that supports it — the processor is going to read multiple words.
So by that answer it looks like the address alignment should be divisble by the cpu word size.
So I'm seeing some conflicted statements on how to optimize data alignment for cpu read/write and I'm not sure if I'm not understanding something correctly or there're some wrong answers? Maybe someone could clear this out for me on what the address alignment should be divisible by.