1

I am trying to cast a byte stream (raw data from serial port) into a structure for ease of use. I have managed to replicate the problem in a minimal working example:

#include <stdio.h>

typedef struct {
    unsigned int source: 4;
    unsigned int destination: 4;
    char payload[15];
} packet;

int main(void)
{
    // machine 9 sends a message to machine 10 (A)
    char raw[20] = {0x9A, 'H', 'e', 'l', 'l', 'o', '!', 0};
    packet *message = (packet *)raw;
    printf("machine %d ", message->source);
    printf("says '%s' to ", message->payload);
    printf("machine %d.\n", message->destination);
    return 0;
}

I would expect the field source to get 9 from 0x9A and destination to get A from 0x9A so that the output says:

machine 9 says 'Hello!' to machine 10.

But I get:

machine 10 says 'Hello!' to machine 9.

Any idea why this might be so?

First User
  • 704
  • 5
  • 12
  • 6
    `packet *message = (packet *)raw;` invokes undefined behavior in several ways. There's no guarantee about alignment and it's a strict aliasing violation (https://stackoverflow.com/questions/98650/what-is-the-strict-aliasing-rule). Additionally, there are few guarantees how these bitfields are stored interally inside the struct. – Lundin Feb 02 '23 at 09:00
  • 6
    Many details about bitfields are implementation defined. You cannot rely on any specific order in memory. – Gerhardh Feb 02 '23 at 09:01
  • 3
    Additionally, `0x9A` may be too large to fit inside a `char`. Never use the `char` type for storing raw binary data, since it has implementation-defined signedness. Use `uint8_t`/`unsigned char` instead. – Lundin Feb 02 '23 at 09:05
  • Just my opinion, but: If you're trying to read/write data to/from streams, and if you want "ease of use", do *not* try to define a C struct that exactly matches your byte stream format. It's an attractive and very popular technique, and it seems easy enough at first, but it's not; it's actually a miserable slog, rife with implementation-defined behavior and unportabilities. – Steve Summit Feb 06 '23 at 15:18

3 Answers3

2

I am trying to cast a byte stream (raw data from serial port) into a structure for ease of use.

char raw[20] = {0x9A, 'H', 'e', 'l', 'l', 'o', '!', 0};
packet *message = (packet *)raw;

This is poor code for several reasons.

  • Alignment: (packet *)raw risks undefined behavior when the alignment needs of the structure packet exceed the alignment needs of a char.

  • Size: The size of the members .source and .destination might not be packed in 1 byte. Many attributes of bit-fields are implementation dependent. The overall size of raw[] (20) may differ from packet.

  • Aliasing. Compiler can assume changes to raw[20] does not affect message.


What should be done depends on the unposted larger code.

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
1

0x9A - lowest 4 bits A, highest 4 bits 9.

In your structure if you compile with GCC member source (occupying lower nibble) is assigned A and destination (occupying higher nibble) is assigned 9

So program output is correct.

0___________
  • 60,014
  • 4
  • 34
  • 74
0

I am trying to cast a byte stream (raw data from serial port) into a structure for ease of use.

Interpreting a raw byte sequence in some random protocol as the representation of a structure type requires taking the target ABI into account for full details of structure layout, including bitfields, and perhaps also understanding and applying the C extensions made available by your compiler. Since it is ABI- and compiler-dependent, the result is usually non-portable.

I have managed to replicate the problem in a minimal working example: [...] I would expect [...]

There is not much you can safely expect here without referring to the ABI. It includes

  • the source and destination bitfields are packed into adjacent ranges of bits in the same "addressible storage unit".
  • the ASU containing those starts at the first byte of the overall structure.

But you cannot assume

  • anything about the size of the ASU containing the bitfields (other than it is no smaller than the smallest addressable unit of storage), or
  • the relative order of the bitfields within it, or
  • if it is larger than 8 bits, which 8 bits of it are used to store the two bitfields' representations. Nor
  • whether the storage for payload starts with the next byte following the ASU, or
  • whether the last byte of the payload member is the last byte of the overall structure.

I get:

machine 10 says 'Hello!' to machine 9.

Any idea why this might be so?

The machine seems to do what you appear to have expected with accessing the char array via a packet *, though that behavior is in fact undefined. That reveals that the machine has chosen a 1-byte ASU for the two bitfields, without any padding between that and the payload, and that it lays out the bitfields starting at the least-significant end of the ASU. That is well within the bounds of the C implementation's discretion.

John Bollinger
  • 160,171
  • 8
  • 81
  • 157