16

Our headers use #pragma pack(1) around most of our structs (used for net and file I/O). I understand that it changes the alignment of structs from the default of 8 bytes, to an alignment of 1 byte.

Assuming that everything is run in 32-bit Linux (perhaps Windows too), is there any performance hit that comes from this packing alignment?

I'm not concerned about portability for libraries, but more with compatibility of file and network I/O with different #pragma packs, and performance issues.

Philip Conrad
  • 1,451
  • 1
  • 13
  • 22
Nicolas
  • 1,106
  • 11
  • 25

8 Answers8

17

Memory access is fastest when it can take place at word-aligned memory addresses. The simplest example is the following struct (which @Didier also used):

struct sample {
   char a;
   int b;
};

By default, GCC inserts padding, so a is at offset 0, and b is at offset 4 (word-aligned). Without padding, b isn't word-aligned, and access is slower.

How much slower?

  • For 32-bit x86, according to the Intel 64 and IA32 Architectures Software Developer's Manual:
    The processor requires two memory accesses to make an unaligned memory access; aligned accesses require only one memory access. A word or doubleword operand that crosses a 4-byte boundary or a quadword operand that crosses an 8-byte boundary is considered unaligned and requires two separate memory bus cycles for access.
    As with most performance questions, you'd have to benchmark your application to see how much of an issue this is in practice.
  • According to Wikipedia, x86 extensions like SSE2 require word alignment.
  • Many other architectures require word alignment (and will generate SIGBUS errors if data structures aren't word-aligned).

Regarding portability: I assume that you're using #pragma pack(1) so that you can send structs across the wire and to and from disk without worrying about different compilers or platforms packing structs differently. This is valid, however, there are a couple of issues to keep in mind:

  • This does nothing to handle big endian versus little endian issues. You can handle these by calling the htons family of functions on any ints, unsigned, etc. in your structs.
  • In my experience, working with packed, serializable structs in application code isn't a lot of fun. They're very difficult to modify and extend without breaking backwards compatibility, and as already noted, there are performance penalties. Consider transferring your packed, serializable structs' contents into equivalent non-packed, extensible structs for processing, or consider using a full-fledged serialization library like Protocol Buffers (which has C bindings).
Josh Kelley
  • 56,064
  • 19
  • 146
  • 246
  • 1
    +1 for excellent answer and for pointing out that some non-x86 architectures actually *require* proper alignment for certain data types. – Paul R Oct 17 '11 at 12:47
  • Endinaness is not actually handled, but it's "OK" since our entire backoffice is Linux driven. I will actually run a benchmark, and maybe report it back here. Thank you for the answer. – Nicolas Oct 17 '11 at 13:22
6

Yes. There absolutely are.

For instance, if you define a struct:

struct dumb {
    char c;
    int  i;
};

then whenever you access the member i, the CPU is slowed, because the 32 bits value i is not accessible in a native, aligned way. To make it simple, imagine that the CPU has to get 3 bytes from memory, and then 1 other byte from the next location to transfer the value from the memory to the CPU registers.

Didier Trosset
  • 36,376
  • 13
  • 83
  • 122
3

When you declare a struct, most of the compilers insert padding bytes between members to ensure that they are aligned to appropriate addresses in memory (usually the padding bytes are a multiple of the type's size). This enables the compiler to have optimized access in aceessing these members.

#pragma pack(1) instructs the compiler to pack structure members with particular alignment. The 1 here tells the compiler not to insert any padding between members.

So yes there is a definite performance penalty, since you force the compiler to do something beyond what it would naturally do for performance optimization.Also, some platforms demand that the objects be aligned at specific boundaries and using unalighed structures might give you segmentation faults.

Ideally, it is best to avoid changing the default natural alignment rules. But If the 'pragma pack' directive cannot be avoided at all (as in your case), then the original packing scheme must be restored after the definition of the structures that require tight packing.

For eg:

//push current alignment rules to internal stack and force 1-byte alignment boundary
#pragma pack(push,1)  

/*   definition of structures that require tight packing go in here   */

//restore original alignment rules from stack    
#pragma pack(pop)
Alok Save
  • 202,538
  • 53
  • 430
  • 533
  • 1
    Or better, use gcc's native [`aligned` attribute](http://gcc.gnu.org/onlinedocs/gcc-3.2.3/gcc/Type-Attributes.html) to mark only the current structure. – Blagovest Buyukliev Oct 17 '11 at 12:37
2

It depends on the underlying architecture and the way it handles unaligned addresses.

x86 handles unaligned addresses gracefully, although at a performance cost, while other architectures such as ARM may invoke an alignment fault (SIGBUS), or even "round" the misaligned address to the closest boundary, in which case your code will fail in a hideous way.

Bottom line is, pack it only if you are sure that the underlying architecture will handle unaligned addresses, and if the cost of network I/O is higher than the processing cost.

Blagovest Buyukliev
  • 42,498
  • 14
  • 94
  • 130
  • what's your suggestion, if the data is to be sent between a ARM and a X86 machine, what pack format should I use? – Benny Sep 12 '13 at 05:35
1

Are there performance issues when using pragma pack(1)?

Absolutely. In January 2020, Microsoft's Raymond Chen posted concrete examples of how using #pragma pack(1) can produce bloated executables that take many, many more instructions to perform operations on packed structures. Especially on non-x86 hardware that doesn't directly support misaligned accesses in hardware.

Anybody who writes #pragma pack(1) may as well just wear a sign on their forehead that says “I hate RISC”

When you use #pragma pack(1), this changes the default structure packing to byte packing, removing all padding bytes normally inserted to preserve alignment.

...

The possibility that any P structure could be misaligned has significant consequences for code generation, because all accesses to members must handle the case that the address is not properly aligned.

void UpdateS(S* s)
{
 s->total = s->a + s->b;
}

void UpdateP(P* p)
{
 p->total = p->a + p->b;
}

Despite the structures S and P having exactly the same layout, the code generation is different because of the alignment.

UpdateS                       UpdateP
Intel Itanium

adds  r31 = r32, 4            adds  r31 = r32, 4
adds  r30 = r32  8 ;;         adds  r30 = r32  8 ;;
ld4   r31 = [r31]             ld1   r29 = [r31], 1
ld4   r30 = [r30] ;;          ld1   r28 = [r30], 1 ;;
                              ld1   r27 = [r31], 1
                              ld1   r26 = [r30], 1 ;;
                              dep   r29 = r27, r29, 8, 8
                              dep   r28 = r26, r28, 8, 8
                              ld1   r25 = [r31], 1
                              ld1   r24 = [r30], 1 ;;
                              dep   r29 = r25, r29, 16, 8
                              dep   r28 = r24, r28, 16, 8
                              ld1   r27 = [r31]
                              ld1   r26 = [r30] ;;
                              dep   r29 = r27, r29, 24, 8
                              dep   r28 = r26, r28, 24, 8 ;;
add   r31 = r30, r31 ;;       add   r31 = r28, r29 ;;
st4   [r32] = r31             st1   [r32] = r31
                              adds  r30 = r32, 1
                              adds  r29 = r32, 2 
                              extr  r28 = r31, 8, 8
                              extr  r27 = r31, 16, 8 ;;
                              st1   [r30] = r28
                              st1   [r29] = r27, 1
                              extr  r26 = r31, 24, 8 ;;
                              st1   [r29] = r26
br.ret.sptk.many rp           br.ret.sptk.many.rp

...
[examples from other hardware]
...

Observe that for some RISC processors, the code size explosion is quite significant. This may in turn affect inlining decisions.

Moral of the story: Don’t apply #pragma pack(1) to structures unless absolutely necessary. It bloats your code and inhibits optimizations.

#pragma pack(1) and its variations are also subtly dangerous - even on x86 systems where they supposedly "work"

Andrew Henle
  • 32,625
  • 3
  • 24
  • 56
0

On some platforms such as the ARM Cortex-M0, the 16-bit load/store instructions will fail if used on an odd address, and the 32-bit instructions will fail if used on addresses that are not multiples of four. Loading or storing a 16-bit object from/to an address which is might be odd will require using three instructions rather than one; for a 32-bit address, seven instructions would be required.

On clang or gcc, taking the address of a packed structure member will yield a pointer that will often be unusable for purposes of accessing that member. On the more useful Keil compiler, taking the address of a __packed structure member will yield a __packed qualified pointer which can only be stored in pointer objects that are qualified likewise. Accesses made via such pointers will use the multi-instruction sequence necessary to support unaligned accesses.

supercat
  • 77,689
  • 9
  • 166
  • 211
0

Technically, yes, it would affect performance, but only with regards to internal processing. If you need the structures packed for network/file IO, there's a balance between the packed requirement and just internal processing. By internal processing, I mean, the work you do on the data between the IO. If you do very little processing, you won't lose much in terms of performance. Otherwise, you may wish to do internal processing on properly aligned structures and only "pack" the results when doing IO. Or you could switch to using only default aligned structures, but you'll need to ensure everyone aligns them the same way (network and file clients).

Ioan
  • 2,382
  • 18
  • 32
0

There are certain machine code instructions that operate on 32 bit or 64 bit (or even more) but expect the data to be aligned on memory adresses. If they are not they have to do more than one read/write cyce on memory to perform their task. How bit that performance hit is depends heavily on what you are doing with the data. If you build large arrays of structs and perform extensive calculations on them it might become big. But if you only store data once just to read it back at some other time converting it to a byte stream anyway, then it might be barely noticable.

Ole Dittmann
  • 1,764
  • 1
  • 14
  • 22