0

When accessing a structure from a byte stream (File, Network, etc...) what does alignment mean?

For example, I can understand why a compiler would want to pad the following structure with extra bytes to align the int a and short b at word addresses (multiples of 4). However, what does this mean when accessing memory at a random address via using a pointer? Does using the -> operator generate inefficient code? Or am I missing something?

typedef struct{
    void*   ptr;  //4 bytes
    char    c1;   //1 byte
    int     a;    //4 bytes
    char    c2;   //1 byte
    short   b;    //2 byte
    char    c3;   //1 byte
} Odd_Struct;     //Minimum needed = 13 bytes, actual (with padding) = 20

unsigned char buffer[128];
Odd_Struct odd_struct;

odd_struct.a = 123456789;
odd_struct.b = 12345;

printf("sizeof(odd_struct): %d\n", sizeof(Odd_Struct));

memcpy(buffer+3, &odd_struct, sizeof(Odd_Struct));

Odd_Struct* testPtr = (Odd_Struct*)(buffer+3);

printf("testPtr->a: %d\n", testPtr->a);
printf("testPtr->b: %d\n", testPtr->b);

And the output

sizeof(odd_struct): 20
testPtr->a: 123456789
testPtr->b: 12345

To answer why I would want to do this:

I am intending to use a system with very limited RAM, so it's tempting to just cast a byte (unsigned char) pointer to a struct pointer and access it that way. Without an additional copy of memory. I.E. use the bytes in place. This is working fine on a x86 PC using gcc. But based on comments below, this seems like it might be a bad idea.

Derek
  • 39
  • 7
  • C11 draft standard n1570, *6.3 Conversions, 6.3.2.3 Pointers 7 A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned for the referenced type, the behavior is undefined.[...]* – EOF Apr 21 '16 at 21:48
  • It could give a Bus Error when trying to use read or write access to the field value. – jdarthenay Apr 21 '16 at 21:57
  • It is not clear what you mean. Why (and how) would it make a difference where the **contents** of an object originates? If you access an object of one type via a pointer of a different type, you invoke undefined behaviour. If you are about serialisation, simple and only portable solution is to use marshalling with bitshift/bitops. – too honest for this site Apr 21 '16 at 22:04
  • "If the resulting pointer is not correctly aligned for the referenced type, the behavior is undefined" So if I read this right, not only is this code un-efficient, its not guaranteed to work at all? It works in gcc... why? would it not work on other compilers/architectures? – Derek Apr 21 '16 at 22:11
  • I could suggest some alternative code if you give a bit more context (e.g. explain why are you memcpying this struct to a random place in a char buffer) – M.M Apr 21 '16 at 23:27

2 Answers2

2

Alignment means that the implementation may place restriction on the addresses at which you can access or point to an object of a certain type. This page describes why processors might make this restriction in order to improve performance.

You can inspect the alignment requirement of a type (since C11) by checking _Alignof(Odd_Struct).

If this does not equal1, then the code (Odd_Struct*)(buffer+3) may cause undefined behaviour. Whether or not it actually does cause UB depends on whether buffer+3 happens to be a multiple of the alignment requirement.

The following code is correct (well - technically the possibility exists that it is not, but the standard intends that uintptr_t behave sensibly):

int req = _Alignof(Odd_Struct);
if ((uintptr_t)(buffer+3) % req)
    printf("Would be undefined behaviour.\n");
else
{
    Odd_Struct* testPtr = (Odd_Struct*)(buffer+3);

    printf("testPtr->a: %d\n", testPtr->a);
    printf("testPtr->b: %d\n", testPtr->b);
}

In theory a compiler could detect a potential unaligned access and generate different assembly code to simulate accessing the value as you intend. I don't know of any compiler which actually does this though.

Typically the compiler will assume the access is correctly aligned and generate the right assembly for that case only. Then the behaviour will depend on the processor. For example, typically ARM CPUs cause a hardware trap for an unaligned access, and Intel CPUs implement the access in hardware using a slower technique, as described on the page I linked earlier.

Some CPUs might even trap or silently load an incorrect address as soon as you try to load the unaligned address into an address register.

To write robust code you should make no assumptions about how undefined behaviour might manifest itself; instead, avoid writing code with undefined behaviour in the first place.

M.M
  • 138,810
  • 21
  • 208
  • 365
  • Thanks for pointing out _Alignof() FYI, on my system _Alignof(Odd_Struct) returns 4, and buffer+3 is not multiple of 4 so "Would be undefined behavior" is printed. However, the code still executes as expected even though Odd_Struct is out of alignment. However, I see now that this is not guaranteed to work on all systems. – Derek Apr 21 '16 at 23:28
0

Thanks to EOF's comments I was able to find two other similar questions: Is converting between pointer-to-T, array-of-T and pointer-to-array-of-T ever undefined behaviour?

Unaligned access through reinterpret_cast

This code works because although the behavior is undefined, the x86 PC I am using to test must support unaligned instructions.

However, this code is not portable, and not even guaranteed to work with future versions of gcc (as gcc may optimize the instructions to include an instruction that requires alignment).

In short, it is a bad idea to do this, even though it may be a tempting way to save a few bytes of memory.

Community
  • 1
  • 1
Derek
  • 39
  • 7