Is it well-defined to hold a misaligned pointer, as long as you don't ever dereference it?

Question

I have some C code that parses packed/unpadded binary data that comes in from the network.

This code was/is working fine under Intel/x86, but when I compiled it under ARM it would often crash.

The culprit, as you might have guessed, was unaligned pointers -- in particular, the parsing code would do questionable things like this:

uint8_t buf[2048];
[... code to read some data into buf...]
int32_t nextWord = *((int32_t *) &buf[5]);  // misaligned access -- can crash under ARM!

... that's obviously not going to fly in ARM-land, so I modified it to look more like this:

uint8_t buf[2048];
[... code to read some data into buf...]
int32_t * pNextWord = (int32_t *) &buf[5];
int32 nextWord;
memcpy(&nextWord, pNextWord, sizeof(nextWord));  // slower but ARM-safe

My question (from a language-lawyer perspective) is: is my "ARM-fixed" approach well-defined under the C language rules?

My worry is that maybe even just having a misaligned-int32_t-pointer might be enough to invoke undefined behavior, even if I never actually dereference it directly. (If my concern is valid, I think I could fix the problem by changing pNextWord's type from (const int32_t *) to (const char *), but I'd rather not do that unless it's actually necessary to do so, since it would mean doing some pointer-stride arithmetic by hand)

Accessing the contents of `pNextWord` will give a strict aliasing violation, regardless of alignment. So you have two cases of major UB here. Use `memcpy` to avoid that bug too. — Lundin, Jul 06 '18 at 08:34
You could do away with `pNextWord` and just do `memcpy(&nextWord, &buf[5], sizeof(nextWord));` — dbush, Jul 06 '18 at 14:35

score 22 · Accepted Answer · edited Aug 05 '19 at 00:29

No, the new code still has undefined behaviour. C11 6.3.2.3p7:

A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned 68) for the referenced type, the behavior is undefined. [...]

It doesn't say anything about dereferencing the pointer - even the conversion has undefined behaviour.

Indeed, the modified code that you assume is ARM-safe might not be even Intel-safe. Compilers are known to generate code for Intel that can crash on unaligned access. While not in the linked case, it might just be that a clever compiler can take the conversion as a proof that the address is indeed aligned and use a specialized code for memcpy.

Alignment aside, your first excerpt also suffers from strict aliasing violation. C11 6.5p7:

An object shall have its stored value accessed only by an lvalue expression that has one of the following types:88)

a type compatible with the effective type of the object,

a qualified version of a type compatible with the effective type of the object,

a type that is the signed or unsigned type corresponding to the effective type of the object,

a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,

an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or

a character type.

Since the array buf[2048] is statically typed, each element being char, and therefore the effective types of the elements are char; you may access the contents of the array only as characters, not as int32_ts.

I.e., even

int32_t nextWord = *((int32_t *) &buf[_Alignof(int32_t)]);

has undefined behaviour.

You're misinterpreting C11 6.3.2.3p7, that talks about misalignment against the referenced type, i.e. integer, struct, etc, and not against memory access misalignment. — DewiW, Jul 06 '18 at 14:00
Putting aside alignment, I believe the access is well defined as the contents of `buf` are *not* being accessed as `int`. `pNextWord` is converted to a `void *` when passed to the `memcpy` function, which then copies the bytes in a safe manner. — dbush, Jul 06 '18 at 14:34
@dbush you're correct. Missing clarification there, I meant the original code. — Antti Haapala -- Слава Україні, Jul 06 '18 at 15:39

lee qiaoping · Answer 2 · 2018-07-06T09:43:42.620

8

To safely parse multi-byte integer across compilers/platforms, you can extract each byte, and assemble them to integer according to the endian. For example, to read 4-byte integer from big-endian buffer:

uint8_t* buf = any address;

uint32_t val = 0;
uint32_t  b0 = buf[0];
uint32_t  b1 = buf[1];
uint32_t  b2 = buf[2];
uint32_t  b3 = buf[3];

val = (b0 << 24) | (b1 << 16) | (b2 << 8) | b3;

edited Jul 06 '18 at 09:43

answered Jul 06 '18 at 09:03

lee qiaoping

155
4

7

though this code has undefined behaviour as well :D (but for example GCC guarantees correct behaviour). `b0` would be promoted to a *signed int* and then 1 might be shifted to the sign bit of the 32-bit `int` - better to declare all b0 - b3 as `uint32_t`. – Antti Haapala -- Слава Україні Jul 06 '18 at 09:10
3

Example code is refactored, Thanks @ Antti Haapala for notice, most of my work is on linux/windows, ^^. – lee qiaoping Jul 06 '18 at 09:43

score 5 · Answer 3 · answered Jul 10 '18 at 23:00

Some compilers may assume no pointer will ever hold a value that is not properly aligned for its type, and perform optimizations that rely upon that. As a simple example, consider:

void copy_uint32(uint32_t *dest, uint32_t *src)
{
  memcpy(dest, src, sizeof (uint32_t));
}

If both dest and src hold 32-bit aligned addresses, the above function could be optimized to one load and one store even in platforms that don't support unaligned accesses. If the function had been declared to accept arguments of type void*, however, such an optimization would not be allowed on platforms where unaligned 32-bit accesses would behave differently from a sequence of byte accesses, shifts, and bit-wise operations.

score 2 · Answer 4 · answered Jul 06 '18 at 19:24

As mentioned in Antti Haapala's answer, simply converting a pointer to another type when the resulting pointer is not properly aligned invokes undefined behavior as per section 6.3.2.3p7 of the C standard.

Your modified code only uses pNextWord to pass to memcpy, where it gets converted into a void *, so you don't even need a variable of type uint32_t *. Just pass the address of the first byte in the buffer you want to read from to memcpy. Then you don't need to worry about alignment at all.

uint8_t buf[2048];
[... code to read some data into buf...]
int32_t nextWord;
memcpy(&nextWord, &buf[5], sizeof(nextWord));

Right, but my question was about what happens if you do have such a pointer. If I removed the `uint32_t *` pointer from my example, then it's no longer an example. — Jeremy Friesner, Jul 06 '18 at 19:38

Is it well-defined to hold a misaligned pointer, as long as you don't ever dereference it?

4 Answers4