c - casting uint8_t* to uint32_t* behaviour

Question

I have read this question: How does casting uint8* to uint32* work? but I am unsure of the answer given.

I'm newbie embedded C programmer working on an project that uses GCC and I've been refactoring chunks of code to reduce the memory usage. An example is where I changed the data types of some variables from uint32_t to smaller sized types:

uint8_t colour;
uint16_t count;
uint16_t pixel;

func((uint32_t*)&colour);
func((uint32_t*)&count);
func((uint32_t*)&pixel);

Where func(uint32_t* ptr) modifies the value passed in to data it receives on a slave port. I'm coming across a problem where the above code does not behave correctly when -O1, -O2, -O3 or -Os optimisation is enabled. E.g when values 1 5 1 are received on the comms port and the function respectively, the value that is set for the variables are 0 0 1. When no optimisation is enabled, the values are set correctly.

The code behaves correctly if I change the data types back to uint32_t. I don't understand why I don't get any warnings from the compiler (I have extra warnings turned on). Is the reason this is happening to do with the endianess/allignment?

What are the pitfalls of upcasting from a uint8_t pointer to a uint32_t pointer?

Assuming `func` dereferences the pointer at any point, the behavior is undefined. — Christian Gibbons, Mar 27 '20 at 16:48
See Andrew's answer for the technical reasons why you can't do this as your functions are defined. **But**, I'd suggest _not_ doing it at all. _Unless_, you have much more data that this, you won't save much. How much total memory do you have and how much do you estimate/expect to be able to save? How full is your memory now? (e.g.) If you have: `uint32_t array[1000];` and _can_ convert to `uint8_t array[1000];`. Is it worth the code complication? — Craig Estey, Mar 27 '20 at 17:15
@CraigEstey I wasn't aware it was UB as I didn't get a compiler warning even with `-fstrict-aliasing` and `-Wstrict-aliasing` turned on but I will change it now. The reason I was changing the code was to save as many bytes as I can (project uses 7669/8192 total memory already) — Nubcake, Mar 27 '20 at 17:21
I just did refactoring for an embedded system, so I'm familiar with the problem space. Dare I ask what [ancient/atrocious] processor is this for (e.g. `PIC18F`)? It seems like the processor is decades old. Does this have to work on the _legacy_ hardware or is this for future product(s) where you could use a newer CPU with a bit more memory? How long before the CPU manufacturer stops making the chip? A Raspberry Pi is $25-35 and comes with gigabytes of RAM. So we ought to have at least 64KB of ram which was available circa 1985 ;-) — Craig Estey, Mar 27 '20 at 17:31
@CraigEstey We are using ATxmega128A1U for this particular project ;) It's not really up to me to decide what we use haha ;) — Nubcake, Mar 27 '20 at 17:35
I'm familiar with variants of this processor line--Microchip _strikes_ again :-( If this is for a _new_ product, the selection of this chip is malpractice, pure and simple. There are many equivalents that have more capability. This is underpowered even by Microchip's meager performance standards. — Craig Estey, Mar 27 '20 at 17:45

score 4 · Accepted Answer · answered Mar 27 '20 at 17:08

TLDR

Doing something like passing the address of a uint8_t to something expecting the address of a uint32_t can result in corrupted memory, unknown results, subtle bugs, and code that just plain blows up.

DETAILS

First, if the function is declared as

void func( uint32_t * arg );

and modifies the data arg points to, passing it the address of a uint8_t or a uint16_t will lead to undefined behavior and likely data corruption - if it runs at all (keep reading...). The function will modify data that is not actually part of the object the pointer passed to the function refers to.

The function expected to have access to the four bytes of uint32_t but you gave it the address of only a single uint8_t bytes. Where do the other three bytes go? They likely stomp on something else.

And even if the function only reads the memory and doesn't modify it, you don't know what's in the memory not in the actual object, so the function may behave unpredictably. And the read might not even work at all (keep reading again...).

Additionally, casting the address of a uint8_t is a strict aliasing violation. See What is the strict aliasing rule?. To summarize that, in C you can not safely refer to an object as something that it is not, with the only exception being that you can refer to any object as if it were composed of the proper number of [signed|unsigned] char bytes.

But casting a uint8_t address to a uint32 * means you are trying to access a set of four unsigned char values (assuming uint8_t is actually unsigned char, which is almost certainly true nowadays) as a single uint32_t object, and that's a strict aliasing violation, undefined behavior, and not safe.

The symptoms you see from violating the strict aliasing rule can be subtle and really hard to find and fix. See gcc, strict-aliasing, and horror stories for some, well, horror stories.

Additionally, if you refer to an object as something it's not, you can run afoul of 6.3.2.3 Pointers, paragraph 7 of the C (C11) standard:

A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned for the referenced type, the behavior is undefined.

It's not safe even on x86 no matter what someone might tell you about x86-based systems.

If you ever hear someone say, "Well, it works so that's all wrong.", well, they're very, very wrong.

They just haven't observed it failing.

Yet.

c - casting uint8_t* to uint32_t* behaviour

1 Answers1

Linked