3

I'm currently trying to build a code which is supposed to work on a wide range of machines, from handheld pockets and sensors to big servers in data centers.

One of the (many) differences between these architectures is the requirement for aligned memory access.

Aligned memory access is not required on "standard" x86 CPU, but many other CPU need it and produce an exception if the rule is not respected.

Up to now, i've been dealing with it by forcing the compiler to be cautious on specific data accesses which are known to be risky, using the packed attribute (or pragma). And it works fine.

The problem is, the compiler is so cautious that a lot of performance is lost in the process.

Since performance is important, we would be better of to rewrite some portion of the code to specifically work on strict-aligned cpus. Such code would, on the other hand, be slower on cpus which support unaligned memory access (such as x86), so we want to use it only on cpus which require strict-aligned memory access.

And now the question : how to detect, at compile time, that the target architecture requires strict-aligned memory access ? (or the other way round)

Cyan
  • 13,248
  • 8
  • 43
  • 78

2 Answers2

5

Writing your code for strict memory alignment is a good idea anyway. Even on x86 systems which allow unaligned access, your unaligned reads/writes will cause two memory accesses and some performance will be lost. It's not difficult to write efficient code which works on all CPU architectures. The simple rule to remember is that the pointer must be aligned to the size of the object you're reading or writing. e.g. if writing a DWORD, then (dest_pointer & 3 == 0). Using a crutch such as "UNALIGNED_PTR" types will cause the compiler to generate inefficient code. If you've got a large amount of legacy code that must work immediately, then it makes sense to use the compiler to "fix" the situation, but if it's your code, then write it from the start to work on all systems.

BitBank
  • 8,500
  • 3
  • 28
  • 46
  • Unfortunately, there are situations where i have no choice, and am provided with a pointer which *may* not be aligned. – Cyan Feb 18 '12 at 00:50
  • You can test if the pointer is aligned with this: if ((int)pointer & objectsize-1) {take unaligned action} – BitBank Feb 18 '12 at 00:52
  • Sure but the test itself is much too slow. Better access memory one byte at a time and reconstruct data from these bytes. Obviously, this conclusion is completely wrong on CPU which support unaligned access, such as x86. – Cyan Feb 18 '12 at 00:59
  • The test is not necessarily slow on CPUs with conditional execution (ARM). It sounds like the upstream code needs to be cleaned up to not work with unaligned pointers. Also assuming that the compiler will clean things up can lead to problems. Not all compilers support the "UNALIGNED_PTR" type. – BitBank Feb 18 '12 at 01:01
5

No C implementation that I know of provides any preprocessor macro to help you figure this out. Since your code supposedly runs on a wide range of machines, I assume that you have access to a wide variety of machines for testing, so you can figure out the answer with a test program. Then you can write your own macro, something like below:

#if defined(__sparc__)
/* Unaligned access will crash your app on a SPARC */
#define ALIGN_ACCESS 1
#elif defined(__ppc__) || defined(__POWERPC__) || defined(_M_PPC)
/* Unaligned access is too slow on a PowerPC (maybe?) */
#define ALIGN_ACCESS 1
#elif defined(__i386__) || defined(__x86_64__) || \
      defined(_M_IX86) || defined(_M_X64)
/* x86 / x64 are fairly forgiving */
#define ALIGN_ACCESS 0
#else
#warning "Unsupported architecture"
#define ALIGN_ACCESS 1
#endif

Note that the speed of an unaligned access will depend on the boundaries which it crosses. For example, if the access crosses a 4k page boundary it will be much slower, and there may be other boundaries which cause it to be slower still. Even on x86, some unaligned accesses are not handled by the processor and are instead handled by the OS kernel. That is incredibly slow.

There is also no guarantee that a future (or current) implementation will not suddenly change the performance characteristics of unaligned accesses. This has happened in the past and may happen in the future; the PowerPC 601 was very forgiving of unaligned access but the PowerPC 603e was not.

Complicating things even further is the fact that the code you'd write to make an unaligned access would differ in implementation across platforms. For example, on PowerPC it's simplified by the fact that x << 32 and x >> 32 are always 0 if x is 32 bits, but on x86 you have no such luck.

Dietrich Epp
  • 205,541
  • 37
  • 345
  • 415
  • Interesting. That's indeed the way i've started to do it, but obviously, i can't know all the architectures out there... And btw, no unfortunately, i have no "direct" access to all these architectures, so i must guess most of the issues in advance, in order to avoid too much complications at code maintenance stage. – Cyan Feb 18 '12 at 00:52
  • You can't know all of the architectures out there, but it is irresponsible to claim to support an architecture that you don't have access to, at least if you're writing C. – Dietrich Epp Feb 18 '12 at 01:55
  • I fully agree. My objective is mostly to "prepare" the package for others to custom it for their own target architecture. The better it is prepared, the easier it will be downstream. But as you say, i can't "claim" to have fully validated it. Still, a good software design still has benefits compared to "i don't care". – Cyan Feb 20 '12 at 10:03