10

According to the Wikipedia page Segmentation fault, a bus error can be caused by unaligned memory access. The article gives an example about how to trigger a bus error. In the example, we have to enable alignment checking to see the bus error. What if we disable such alignment checking?

The program seems to work properly. I have a program access unaligned memory frequently, and it is used by quite a few people, but no one reports bus errors or other weird results to me. If we disable alignment checking, what is the side effect of unaligned memory?

Platforms: I am working on x86/x86-64. I also tried my program by compiling it with "gcc -arch ppc" on a Mac and it works properly.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
user172818
  • 4,518
  • 1
  • 18
  • 20
  • What is the platform you are working on?? – Frank Bollack Sep 30 '09 at 08:51
  • Pavel Minaev largely answers my question. I am working on x86/x86_64. I tried my program by compiling it with "gcc -arch ppc" on Mac and it works properly. – user172818 Sep 30 '09 at 09:01
  • Note that unaligned memory access (actually, even just pointer assignment) is undefined behaviour according to the C standard - so a compliant compiler is allowed to do *anything* if you do it (though not all compilers will take that liberty). – sleske Jan 28 '15 at 23:04
  • Related: Violating `alignof(T)` is undefined behaviour and can cause real-world problems even on x86, for example when auto-vectorizing the compiler may assume that a 16-byte alignment boundary is some whole number of `short`s away : [Why does unaligned access to mmap'ed memory sometimes segfault on AMD64?](https://stackoverflow.com/q/47510783) – Peter Cordes Aug 19 '20 at 09:04

3 Answers3

14
  1. It may be significantly slower to access unaligned memory (as in, several times slower).

  2. Not all platforms even support unaligned access - x86 and x64 do, but ia64 (Itanium) does not, for example.

  3. A compiler can emulate unaligned access (VC++ does that for pointers declared as __unaligned on ia64, for example) - by inserting additional checks to detect the unaligned case, and loading/storing parts of the object that straddle the alignment boundary separately. That is even slower than unaligned access on platforms which natively support it, however.

sleske
  • 81,358
  • 34
  • 189
  • 227
Pavel Minaev
  • 99,783
  • 25
  • 219
  • 289
  • Thanks. Few users of my program are working on ia64. Maybe that is why I have not received bug report. – user172818 Sep 30 '09 at 09:04
  • 6
    You miht also add #4 that an OS can emulate unaligned access onbehal of an application by catching the processor exception and fixing it up (kind of like what happens for a page fault). This is slower than the compiler perofrming unaligned fix ups in the generated code. Windows can support this in ia64. – Michael Burr Sep 30 '09 at 14:03
  • 7
    This answer was referenced in the blog post *[Data alignment for speed: myth or reality?](http://lemire.me/blog/archives/2012/05/31/data-alignment-for-speed-myth-or-reality/)*. – Peter Mortensen Jun 07 '12 at 16:24
  • This answer could do with some more specifics. E.g., data alignment is relevant on 32/64-bit boundaries for scalar operations, and 128-bit boundaries for SIMD operations on x86. More importantly, you might want to point out that the biggest cost is for an operation to straddle cache lines – awdz9nld May 29 '13 at 22:52
  • 2
    There's even a counter example of unaligned data being useful to shove as much stuff into the CPU cache to avoid cache missess: http://danluu.com/3c-conflict/ Too bad there is no comments section there because I'd like to hear what others think about this. Would like to test this myself at some point. And of course this is processor architecture dependent. – leetNightshade Jan 03 '14 at 23:28
  • I changed "it is significantly slower" to "it may be...", as the speed penalty depends on many factors, among them processor type and cache effects. – sleske Jan 28 '15 at 23:03
  • 2
    @leetNightshade The article is very interesting and useful. But as I see it, it does not talk about unaligned data in the sense of alignment related to read size. It talks about cache-related alignment which occurs in somewhat higher address bits. To visualise, 32-bit address consists of: TTTT TTTT TTTT TTTT TTTT SSSS SSXX XYYY then the current question talks only about Y and nothing else. Nobody cares to talk about X and T, and the very interesting article You refer talks only about S. Just for an additional comment, the length of the S bits depends on the cache size and cache level of course – Roland Pihlakas May 23 '15 at 12:25
6

It very much depends on the chip architecture. x86 and POWER are very forgiving, Sparc, Itanium and VAX throw different exceptions.

James Anderson
  • 27,109
  • 7
  • 50
  • 78
  • 7
    It does indeed depend on the processor. I recently worked on a DSP that will happily proceed by using the closest aligned memory address when asked to operate on an unaligned one. Debug *that*, you perverted unaligned memory accessing individual. – Dan Moulding Sep 30 '09 at 12:18
  • Indeed, why bother even looking at those last few bits at all - Real Men know what they're doing, anyway :) On the other hand, it would be a convenient architecture to use tagged pointers on, if you use the ignored bits for the tag... – Pavel Minaev Sep 30 '09 at 16:23
  • 1
    @Pavel: consider ARM/thumb interworking. The lsb of the "address" of an instruction indicates whether the CPU should enter thumb mode (1) or ARM mode (0) prior to executing code from that address. Either way, the actual bytes of the target instruction are located in memory at address&~1, it's just that copying a value to the program counter potentially switches mode as well as jumping – Steve Jessop Sep 30 '09 at 18:21
2

Consider the following example I have just tested on ARM9:

//Addresses       0     1     2    3     4     5     6     7      8    9
U8 u8Temp[10] = {0x11,0x22, 0x33, 0x44, 0x55, 0x66, 0x77, 0x88, 0x99, 0x00};

U32 u32Var;

u32Var = *((U32*)(u16Temp+1));  // Let's read four bytes starting from 0x22

// You would expect that here u32Var will have a value of 0x55443322 (assuming we have little endian)
// But in reallity u32Var will be 0x11443322!
// This is because we are accessing address which %4 is not 0.
Potzon
  • 33
  • 1
  • 5
  • 4
    I think you have a spelling error - your third statement references `u16Temp` variable, where is its declaration? I only see `u8Temp`. – Armen Michaeli Jun 29 '13 at 16:06