3

Why is the default alignment 8 byte for int64_t (e.g. long long) in 32 bit x86 ABIs? 4 byte alignment would appear to be fine, because it can only be accessed as two 4B halves.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
q126y
  • 1,589
  • 4
  • 18
  • 50
  • 2
    Possible duplicate of http://stackoverflow.com/questions/1054657/: "*The usual rule of thumb (straight from Intels and AMD's optimization manuals) is that every data type should be aligned by its own size. An `int32` should be aligned on a 32-bit boundary, an `int64` on a 64-bit boundary, and so on. A char will fit just fine anywhere.*" A `long long` Is 64bits in size, so it is best aligned using 8 byte alignment. – Remy Lebeau Dec 29 '15 at 04:34
  • @RemyLebeau yes, I know that. I was looking for reasoning behind the rule of thumb. the question that you linked mentions that 32 bit processors have 64 bit data bus, then 8 byte alignment makes sense. Thanks! – q126y Dec 29 '15 at 05:06
  • All 32-bit C and C++ compilers I know (but I don't get out much) make an effort to keep 64-bit types aligned to 8. It matters most of all for *double*, accessing them when they are not aligned is very expensive, fat x3 when it straddles a cache line. Whether yours does as well is unclear, very odd that SO users keep the name of their compiler a secret. – Hans Passant Dec 29 '15 at 06:07
  • @HansPassant what do you mean by fat x 3? I have 64 bit compiler. I read wikipedia article about the alignment requirements, and this was mentioned there. – q126y Dec 29 '15 at 09:38
  • Over 3 times as slow. If you use a 64-bit compiler then a question about a 32-bit x86 architecture is quite irrelevant. – Hans Passant Dec 29 '15 at 09:43
  • If you allowed splitting an ordinary `int64_t`, then you'd need special, new alignment rules for `std::atomic`. – Kerrek SB Dec 29 '15 at 10:27

1 Answers1

2

Interesting point: If you only ever load it as two halves into 32bit GP registers, then 4B alignment means those operations will happen with their natural alignment.

However, it's probably best if both halves of the variable are in the same cache line, since almost all accesses will read / write both halves. Aligning to the natural alignment of the whole thing takes care of that, even ignoring the other reasons below.


32bit x86 can load 64bit integers in a single 64bit-load using MMX or SSE2 movq. Handling 64bit add/sub/shift/ and bitwise booleans using vector instructions is more efficient (single instruction), as long as you don't need immediate constants or mul or div. The vector instructions with 64b elements are still available in 32b mode.


Atomic 64bit compare-and-exchange is also available in 32bit mode (lock CMPXCHG8B m64 works just like 64bit mode's lock CMPXCHG16B m128, using two implicit registers (edx:eax)). IDK what kind of penalty it has for crossing a cache-line boundary.


Modern x86 CPUs have essentially no penalty for misaligned loads/stores unless they cross cache-line boundaries, which is why I'm only saying that, and not saying that misaligned 64b would be bad in general. See the links in the wiki, esp. Agner Fog's guides.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847