How to understand the following paragraph

Question

Violating Type Rules: It is undefined behavior to cast an int* to a float* and dereference it (accessing the "int" as if it were a "float"). C requires that these sorts of type conversions happen through memcpy: using pointer casts is not correct and undefined behavior results. The rules for this are quite nuanced and I don't want to go into the details here (there is an exception for char*, vectors have special properties, unions change things, etc). This behavior enables an analysis known as "Type-Based Alias Analysis" (TBAA) which is used by a broad range of memory access optimizations in the compiler, and can significantly improve performance of the generated code. For example, this rule allows clang to optimize this function:

How can you use the memcpy function for type coercion? And what about the exception to char*?

I don't understand how to use the memcpy function for type coercion？

In regards to the first part - an casting an int* to a float* and dereferencing it causes the bits to be intepreted as a float, which could be anything depending on the compiler/system and endianness etc. In addition, if one type is larger than the other, casting to a larger type and dereferencing can read memory that doesn't belong to the program. Using `memcpy` gets around this by specifying the number of bytes to copy, but it is still undefined behaviour if you copy the wrong number of bytes. Furthermore, the copied memory may not represent a valid float once converted. — CoderMuffin, Feb 08 '23 at 18:17
@CoderMuffin more importantly it violates strict aliasing rules (and it is an UB by definition). — 0___________, Feb 08 '23 at 20:10

Steve Summit · Accepted Answer · 2023-02-08T21:42:35.727

Suppose you have the float value 1.25. And suppose you want to confirm that its actual IEEE-754 representation in hexadecimal is 3fa00000. There are at least four different ways you might try to do this:

(1) Take a float pointer and cast it to an integer pointer, and indirect on it:

float f = 1.25;
printf("%08x\n", *(uint32_t *)&f);

(This fragment quietly assumes 32-bit int. For better portability, you could use printf("%08" PRIx32 "\n", *(uint32_t *)&f);.)

(2) Use a union:

union {float f; uint32_t i;} u;
u.f = f;
printf("%08x\n", u.i);

(3) Use a char pointer, and iterate/index:

unsigned char *p = (unsigned char *)&f;
for(int i = 3; i >= 0; i--) printf("%02x", p[i]);

(Note that this code fragment assumes little-endian.)

(4) Use memcpy:

uint32_t x;
memcpy(&x, &f, 4);
printf("%08x\n", x);

Now, the take-home lesson is that not all of these methods work reliably any more, because of the strict aliasing rule.

In particular, method (1) is flatly illegal. It's a textbook example of what the strict aliasing rule disallows.

I think you're still allowed to use a union as in method 2, but you may have to put on a language lawyer hat to convince yourself of it. (See also the comments on this answer below.)

Methods (3) and (4), however, continue to work, because they take advantage of an explicit exception to the strict aliasing rule, namely that you are allowed to access the bits of an object using a punned pointer of the "wrong" type, as long as the "wrong type" is specifically a character pointer.

So I think this is clear, but in answer to your specific questions:

How can you use the memcpy function for type coercion?

As in method (4).

And what about the exception to char *?

That's the explicit exception in the strict aliasing rule that allows method (3) to work.

The rules, by the way, are significantly different here in C than in C++. Strictly speaking, I believe, in C++ not even method (3) is legal, and the only way you're allowed to do this sort of thing any more is with method (4) and an implicit call to memcpy. (However, I'm told that optimizing compilers tend to treat calls to memcpy very specially these days, not only replacing explicit function calls with inline register moves, but sometimes even optimizing out the copy altogether, and doing something like method 1 or 2 internally, if they know they can get away with it.)

Are you sure about (3) violating strict aliasing? My understanding was that type punning via a union was legitimised in C99+TC3. (E.g., see footnote 82, attached to section 6.5.2.3, in n1256.pdf.) — Mark Dickinson, Feb 08 '23 at 18:45
@MarkDickinson No, I'm not sure! I can never remember, and I was hoping someone would correct me if necessary. I'll adjust the wording. — Steve Summit, Feb 08 '23 at 18:49
https://stackoverflow.com/q/55254998/2410359 and https://stackoverflow.com/a/98702/2410359 may help — chux - Reinstate Monica, Feb 08 '23 at 18:52
Concerning _endian_, another assumption is the `float` and `uint32_t` have the same endian. They commonly do, yet exceptions exists. — chux - Reinstate Monica, Feb 08 '23 at 20:39

How to understand the following paragraph

1 Answers1