1

I came across the following c code.

union packet_t {
    uint8_t raw[10];
    struct {
        union {
            uint16_t number;
            uint8_t number_byte[2];
        };
        union {
            uint32_t size;
            uint16_t size_word[2];
            uint8_t size_byte[4];
        };
        uint8_t body[4];
    };
} packet;

When I try to test it, I came across some weird behavior, I hope someone can help me figure out what's wrong with this definition. I did some search, but there were no similar problems.

Here is what I tried:

uint8_t test1[] = {0x01,0x02,0x03,0x04,0x05,0x06,0x07,0x08,0x09,0x0a};
int i = 0;
for(i = 0; i < sizeof(test1); i++)
{
    packet.raw[i] = test1[i];
}

printf("packet.size_word[0]=0x%04x\n", packet.size_word[0]);
printf("packet.size_word[1]=0x%04x\n", packet.size_word[1]);

Output is

packet.size_word[0]=0x0605
packet.size_word[1]=0x0807

It completely missed 0x03 and 0x04.

When I use the following definition (remove "uint32_t size;"), it works fine.

union packet_t {
    uint8_t raw[10];
    struct {
        union {
            uint16_t number;
            uint8_t number_byte[2];
        };
        union {
            uint16_t size_word[2];
            uint8_t size_byte[4];
        };
        uint8_t body[4];
    };
} packet;

Here is the output:

packet.size_word[0]=0x0403
packet.size_word[1]=0x0605

Does anyone know why this is happening? I thought items in union always occupied same memory location.

Here is the link for the code.

The original Code is here.

--------------------------Update 11/25/2018---------------------------------

So I'm confident now it's the structure padding and it depends on the CPU architecture.

I tested on Arduino with Atmega328p (an 8-bit MCU), it works like a charm. There was no struct padding since the MCU process 1 byte each time.

However, the code is not portable at all as @selbie mentioned, when writing such code, we have to consider CPU architecture and environment. Simplest solution is not using it.

Further reading:

C – Structure Padding

Structure padding and packing

hat
  • 781
  • 2
  • 14
  • 25
xkimi
  • 39
  • 3
  • 4
    Short answer is **padding**. Long answer is you should never try serialize or parse network data (or any binary blob) format this way. At least if you want your code to be portable. – selbie Nov 25 '18 at 05:46
  • 6
    2 bytes of padding precede the second union to align `uint32_t size` to a 4-byte boundary. You can see this with: `printf("%zu\n", offsetof(union packet_t, size));` – Brett Hale Nov 25 '18 at 05:47
  • @xkimi. a **union** is a type that store different data type at the same memory location but not at the same time. Hence, given the definition of `union packet_t`, the declaration of *packet* and the assignment made through the _for loop_ indicates that **only packet.raw** is set. Therefore, accessing any other data in the **union packate_t packet** will yield a weird result. – eapetcho Nov 25 '18 at 05:49
  • 1
    I don't know why you are saying "it missed 0x03, 0x04. 0x01 - 0x04 are all contained within packet::size. If you add the statement printf("packet.size=0x%04x\n", packet.size); into your main() you will see them there. – natersoz Nov 25 '18 at 06:17
  • @selbie the code is actually intended to run in atmega328p, an 8-bit MCU, it may work in that environment? A really good lesson to learn about structure padding in different archs. Thanks for pointing out. I will try to avoid this in my implementation since it's so architecture dependant. – xkimi Nov 25 '18 at 06:39
  • @eapetcho Type punning with unions is prohibited in C++ but still allowed in C11. (There are some restrictions but I believe they wouldn't be violated when the `struct` was packed.) I'm convinced, the actual problem is the unexpected padding in this case. I'm not sure how the used compiler conforms to standards and whether it's really a C compiler (or a C++ compiler which is used to compile C code) what has to be considered also. – Scheff's Cat Nov 25 '18 at 08:15
  • Further reading: [SO: Is type-punning through a union unspecified in C99, and has it become specified in C11?](https://stackoverflow.com/a/11640603/7478597), [SO: Structure padding and packing](https://stackoverflow.com/a/4306269/7478597) – Scheff's Cat Nov 25 '18 at 08:19

0 Answers0