0

For the sample code below, both method 1 and 2 can extract data from the array. What are the advantages and disadvantages of both methods? Method 1 looks cleaner but must deal with endianness carefully. What else?

#include <stdio.h>
#include <stdint.h>

uint8_t data[] = {0x8A, 0x02, 0x03, 0x04, 0x05};

#pragma pack(1)
typedef struct
{
    uint8_t hex2:4;
    uint8_t hex1:4;
    uint16_t num1;
    uint16_t num2;
} RandomStruct;
#pragma pack()

int main()
{
    // Method 1
    RandomStruct *struct1 = (RandomStruct *)data;
    printf("%X\n", struct1->hex1);
    printf("%X\n", struct1->hex2);
    printf("%X\n", struct1->num1);
    printf("%X\n", struct1->num2);
    
    // Method 2
    uint8_t hex1 = data[0] >> 4;
    uint8_t hex2 = data[0] & 0x0F;
    uint16_t num1 = data[1] | (data[2] << 8);
    uint16_t num2 = data[3] | (data[4] << 8);
    printf("%X\n", hex1);
    printf("%X\n", hex2);
    printf("%X\n", num1);
    printf("%X\n", num2);
    
    return 0;
}
Sam
  • 1,252
  • 5
  • 20
  • 43

1 Answers1

2

Method 1

Problem 1: struct1->hex1, struct1->hext2, struct->num1, and struct1->num2 access the memory of data, which has effective type uint8_t (which is likely a character type), but they access that memory with a structure type. The behavior of this is undefined because it does not conform to the aliasing rules in C 20186 6.5 7:

An object shall have its stored value accessed only by an lvalue expression that has one of the following types: …

(There is a complication here in that the aliasing rules allow “an aggregate or union type that includes one of the aforementioned types among its members,” but I would not expect the nominal use of uint8_t in declaring the bit-field members satisfy that.)

This problem might be fixed by changing RandomStruct *struct1 = (RandomStruct *)data; to RandomStruct struct1;, using memcpy(&struct1, data, sizeof struct1); and changing the subsequent struct1-> occurrences to struct1..

Problem 2: The C standard does not specify the order in which hex2 and hex1 are located in the storage unit used to hold them, per C 2018 6.7.2.1 11:

… The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined…

Note that the order of allocation of bit-fields is not determined by the endianness of the C implementation. Regardless of whether a C implementation puts low-significance bytes of integer types earlier in memory or later in memory, it may place the first bit-field of a structure in the low bits of the storage unit or in the high bits of the storage unit.

This cannot be fixed portably in C (that is, using only strictly conforming code); you must ensure the members are declared in the necessary order for each C implementation you use.

Problem 3: The two bit-fields of the structure do not necessarily occupy just one byte. C 2018 6.7.2.1 11 says:

An implementation may allocate any addressable storage unit large enough to hold a bit-field…

This is a separate issue from packing the structure, as the storage unit used for bit-fields would be considered as bytes used for the bit-fields, not padding bytes that are eliminated by packing.

This cannot be fixed portably in C but a violation of the desired layout could be detected with _Static_assert(sizeof (RandomStruct) == 5, "Error, expected just five bytes in RandomStruct.");.

Method 2

Problem 1: data[1] | (data[2] << 8) is not guaranteed to work in every C implementation because data[2] will be promoted to int, and the C standard allows int to be 16 bits. If the high bit of data[2] is set, then data[2] << 8 would produce a result greater than or equal to 32,768, which is not representable in a 16-bit int. Then overflow occurs, and the behavior is not defined by the C standard, per C 2018 6.5.7 4:

… If E1 has a signed type and nonnegative value, and E1 × 2E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.

This could be fixed by casting data[2] to uint16_t and similarly for data[4].

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
  • Thanks for your explanation! For the problem 1 in method 1, I understand that writing to alias may lead to unexpected behavior as the compiler optimizes away some loading instruction in the next access. However, I do not understand what the potential problem is if I am just reading via the alias as the sample code. Do you have an example? – Sam Jul 10 '23 at 03:14
  • @Sam: Suppose we have a routine `foo` that can see the external definition of `data` as `uint8_t data[] = {…};` and is also passed a pointer to a `RandomStruct`: `void foo(RandomStruct *p) { printf("%d\n", p->hex2); data[0] = -1; printf("%d\n", p->hex2); }`. The C standard allows the compiler to reason here that `p` either does not point to `data` because its type does not conform to the aliasing rules or that it does and the behavior is not defined… – Eric Postpischil Jul 10 '23 at 17:23
  • … In either case, this means the compiler is allowed to generate code that loads the value to be printed for `p->hex2`, then writes to `data[0]`, then uses the already loaded value for `p->hex2` without reloading it to account for the write to `data[0]`. So a compiler could generate that code, and the second `printf` of `p->hex2` would use the old value, not the updated value. – Eric Postpischil Jul 10 '23 at 17:24
  • Let say there is no `data[0] = -1;` in foo routine, does it cause any issue when I simply read the value? – Sam Jul 11 '23 at 03:44