In my recent post about endianness, I was told that one should unionize the types because not doing so can cause UB.
No. You were told that using a union is one way to do what you were after without invoking UB. It's not the only way.
"strict pointer aliasing violation" is mentioned as a problem when not using unions.
The usual terminology is "strict aliasing", not "strict pointer aliasing". Using a union appropriately is one way to avoid strict aliasing violations, but not the only way.
it is mentioned that 8-bit types don't have to have the same memory alignment as 32 bit types, and that this affects the success of casting.
Yes.
I don't understand this.
So I would like to know more about the scenarios in which UB can
happen in the context of accessing 8 bit array members and casting
them to 32 bit to generate a certain endianness via bitshifting, and
how unionizing prevents it.
The first thing to understand is that in this context, "undefined behavior" means that the C language specification does not define the behavior. It does not mean that the program will necessarily behave differently on your machine than you expect (though it very well might do). It does not mean that the compiler must reject the program, or that the program must diagnose an error at runtime, or such -- all of those are possible, but requiring one or more would make that defined behavior.
C aims to be suitable for implementation on substantially all digital computers, including especially on bare metal -- it was, after all, conceived as a language for writing operating systems. Many of the items that the spec explicitly calls out as having undefined behavior are related to differences in the actual behavior of various machines, past, present, or hypothetical future.
The code in your previous post had this form:
uint8_t buffer[4];
// ... assign array element values ...
switch (*((uint32_t *)buffer)) {
// ...
The question touches on two main provisions of the language spec. I'll be quoting from C17, but all versions of the language spec to date have substantially equivalent versions of these provisions. Taking the casting question first, because it's straightforward, paragraph C17 6.3.2.3/7 is the main provision allowing conversion among object-pointer types. It says:
A pointer to an object type may be converted to a pointer to a different object type. If the resulting
pointer is not correctly aligned for the referenced type, the behavior is undefined. Otherwise,
when converted back again, the result shall compare equal to the original pointer. When a pointer to
an object is converted to a pointer to a character type, the result points to the lowest addressed byte
of the object. Successive increments of the result, up to the size of the object, yield pointers to the
remaining bytes of the object.
(Emphasis added.)
Thus, when people tell you that casting a uint8_t *
to type uint32_t *
might produce UB on account of mismatching alignment, that's coming directly from the language spec.
The other main relevant portion of the language spec is the so-called "strict aliasing rule", which in C17 is paragraph 6.5/7:
An object shall have its stored value accessed only by an lvalue
expression that has one of the following types:
- a type compatible with the effective type of the object,
- a qualified version of a type compatible with the effective type of the object,
- a type that is the signed or unsigned type corresponding to the effective type of the object,
- a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
- an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a
subaggregate or contained union), or
- a character type.
Footnote 89 explains that
The intent of this list is to specify those circumstances in which an object may or may not be aliased.
And that's why even if your conversion (uint32_t *)buffer
has well defined behavior, you still invoke UB by attempting to access an object via the resulting pointer. The lvalue in question, *((uint32_t *)buffer)
, has type uint32_t
, and that does not satisfy any of the alternatives given in the strict aliasing rule for accessing a uint8_t
or an array of such. (Do not be confused by the term "effective type". Except where dynamic memory allocation is involved, you can understand "effective type" simply as "type".)
On the other hand, note that the strict aliasing rule explicitly allows access through a union. Furthermore, the union approach also solves the alignment issue, because a union's alignment requirement has to be at least as strict as that of it most strictly aligned member.
If you study that carefully, you may also see that there is at least one approach that works without involving a union: instead of declaring an array of uint8_t
and accessing it via a uint32_t *
, declare a uint32_t
and access its bytes via a uint8_t *
. This does assume that uint8_t
is a character type, and the spec does not guarantee that, but in practice, if your implementation provides uint8_t
at all, then it will be an alias for unsigned char
or possibly for char
.
Another way, which does not rely on uint8_t
to be a character type, would involve copying the contents of your uint8_t
array to an object of type uint32_t
via the memcpy()
function.