2

I know, I saw it already but I couldn't find any good explanation why is this undefined behaviour:

#include <stdio.h>
#include <stdint.h>

//Common union for both types
union float_int {
    float f;
    uint32_t i;
};

int main(void) {
    union float_int fi;
    //This should be problematic
    uint32_t* i_ptr = (uint32_t *)&fi.f;

    fi.f = 10.0f;
    printf("%f : %u\r\n", fi.f, fi.i); //Prints: 10.000000 : 1092616192 which is OK
    printf("%u\r\n", *i_ptr); //Prints: 1092616192 which is also OK

    return 0;
}

If we check memory representation, both are 4-bytes long so there is no memory overflow in pointing or similar.

How is this undefined behaviour?

int main() {
    union float_int fi;
    void* v_ptr = &fi.f;
    uint32_t* i_ptr = (uint32_t *)v_ptr;
}

Is this code still undefined behaviour? I want to read float number as unsigned integer 32-bits.

Why is using memcpy the only available way of doing it?

Sourav Ghosh
  • 133,132
  • 16
  • 183
  • 261
unalignedmemoryaccess
  • 7,246
  • 2
  • 25
  • 40
  • `indefined` --> `undefined`? – Sourav Ghosh Jul 06 '17 at 06:49
  • @SouravGhosh Yes, Thanks. – unalignedmemoryaccess Jul 06 '17 at 06:49
  • @Stargateur give me correct answer, please. I couldn't find it. – unalignedmemoryaccess Jul 06 '17 at 06:50
  • While type-punning through unions is explicitly allowed by the C specification (unlike C++), I'm not so sure about you using pointers to the "wrong" members. If you, in your first example, using e.g. `uint32_t* i_ptr = &fi.i` then that would be okay, even if `&fi.f` is the same address and memory. – Some programmer dude Jul 06 '17 at 06:56
  • Possible duplicate of [gcc, strict-aliasing, and casting through a union](https://stackoverflow.com/questions/2906365/gcc-strict-aliasing-and-casting-through-a-union) or/and [Is the strict aliasing rule incorrectly specified?](https://stackoverflow.com/questions/38798140/is-the-strict-aliasing-rule-incorrectly-specified) – Stargateur Jul 06 '17 at 06:56
  • @tilz0R try the following: `void testalias(union float_int *fi, uint32_t *i_ptr) { *i_ptr = 42; fi->f = 10.0f; printf("%f\n", fi->f); printf("%d\n", *i_ptr); } int main(void) { union float_int fi; uint32_t* i_ptr = (uint32_t *)&fi.f; testalias(&fi, i_ptr); return 0; }` This will print `10.000000` and some corresponding `int` as you expect (note that reading this `int` from the union is already undefined, it wasn't the last object stored) [cont'd] –  Jul 06 '17 at 07:21
  • @tilz0R then move the `testalias` function to a separate translation unit. At least with my gcc version and `-O3`, the result is that it prints `0.000000 42`. The compiler assumes the two pointers not to alias each other as they have incompatible types, so it can reorder the assignments "safely" (of course, it can't, due to the forbidden aliasing). –  Jul 06 '17 at 07:22
  • @FelixPalmen Yes, this will print 10.000.. as float because float was last assigned to the union. Ok, so here is optimization. – unalignedmemoryaccess Jul 06 '17 at 07:23
  • @tilz0R sure this is about optimization, *without* the strict aliasing rule, many optimizations wouldn't be possible. Just don't alias incompatible pointers (and don't read from a `union` member other than the member last written to, except when they are both structs with a common initial sequence) –  Jul 06 '17 at 07:25
  • @FelixPalmen if you check this code: `struct buff {int a; int b; int c;}; char tmp[sizeof(struct buff)]; struct buff b; b.a = 5; memcpy(tmp, &buff, sizeof(buff)); struct buff* ptr = (struct buff *)tmp; printf("%d\r\n", ptr->a);` Is it valid for you? For me it is and I can see this approach on many libraries (LwIP is one of them). Let's assume we have 2 same devices (MCU) and they communicate between with UART and they send structure as bytes. Is this undefined? – unalignedmemoryaccess Jul 06 '17 at 07:26
  • 1
    @tilz0R if you ever really **need** to access the representation of some object, use a `char *` for it -- this is the only thing allowed in C. –  Jul 06 '17 at 07:27
  • @tilz0R no, invalid as well, `tmp` is of type `char` and you're aliasing it with some struct pointer. Only the other way around is valid (aliasing anything *using* a `char` pointer). –  Jul 06 '17 at 07:28
  • If `tmp` would be `void *` then everything would be ok, I assume? *void* tmp2 = tmp; memcpy(tmp2, &buff, sizeof(buff));` and then `struct buff* ptr = tmp2`? If this is still not valid, how can then all these libraries work by using this approach? – unalignedmemoryaccess Jul 06 '17 at 07:30
  • @tilz0R regarding your edit, yes, this **is** undefined. It can still work if you **know** that your `char` buffer contains a valid representation and you won't have problems as long as you don't mix accesses through your `char` pointer and the casted one. But this way, you create code outside of the spec that will only work on your target platform. –  Jul 06 '17 at 07:32
  • Can we continue on chat @FelixPalmen? – unalignedmemoryaccess Jul 06 '17 at 07:32
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/148471/discussion-between-felix-palmen-and-tilz0r). –  Jul 06 '17 at 07:33
  • The [tag:union] tag is only for the SQL "union" keyword @tilzOR; the correct tag for c-style unions is [tag:unions]. For reference please see linked tag wikis. – hat Jan 09 '19 at 12:13

2 Answers2

1

This is not strict aliasing, it is a violation of strict aliasing.

First, you are doing

 uint32_t* i_ptr = (uint32_t *)&fi.f;   //converting to a non-character type pointer

and then, you try to access that by

  printf("%u\r\n", *i_ptr);   //access value via incompatible lvalue expr.

which causes the issue. float and uint32_t are not compatible types.

Quoting C11, chapter §6.5/P7

An object shall have its stored value accessed only by an lvalue expression that has one of the following types: 88)

— a type compatible with the effective type of the object,

— a qualified version of a type compatible with the effective type of the object,

— a type that is the signed or unsigned type corresponding to the effective type of the object,

— a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,

— an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or

— a character type.


In reply to the comment, let's see C11, chapter §6.2.6.1

Values stored in non-bit-field objects of any other object type consist of n × CHAR_BIT bits, where n is the size of an object of that type, in bytes. The value may be copied into an object of type unsigned char [n] (e.g., by memcpy); the resulting set of bytes is called the object representation of the value.

and

Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. [...] Such a representation is called a trap representation.

Sourav Ghosh
  • 133,132
  • 16
  • 183
  • 261
  • I agree that they are not, but both are 4-bytes long and both point to the same memory. CPU don't care what is there, it is just how you read and interpret it. Strange though for me. And only `memcpy` could resolve this. Even more strange. Thanks anyway. – unalignedmemoryaccess Jul 06 '17 at 06:54
  • 1
    @tilz0R OK, let me expand. – Sourav Ghosh Jul 06 '17 at 06:55
  • Hope I find a casting of buffer in one answer. Similar situation. Thanks for update. – unalignedmemoryaccess Jul 06 '17 at 06:59
  • `struct buff {int a; int b; int c;}; char tmp[sizeof(struct buff)]; struct buff b; b.a = 5; memcpy(tmp, &buff, sizeof(buff)); struct buff* ptr = (struct buff *)tmp; printf("%d\r\n", ptr->a);` Is this valid code? Let's assume values from `tmp` were received by UART from the same device (both identical devices communicating from UART, one is sending structure, second receiving the same one). – unalignedmemoryaccess Jul 06 '17 at 07:15
  • @tilz0R looks invalid, what is `&buff`? you meant `&b`? – Sourav Ghosh Jul 06 '17 at 07:20
  • Yes, I meant `&b`. – unalignedmemoryaccess Jul 06 '17 at 09:04
  • @tilz0R even in that case, I believe, `char *` (or, generics, pointer to character types) aliases all, not the other way around. – Sourav Ghosh Jul 06 '17 at 09:05
0

Float and integer representations are different from one another. This is why by defining such a union, is not a good use case.

In your example, you perform casting from void* to uint32_t*. The casting is done in pointer-level. It means that i_ptr points to a location in memory, viewed as an interger, without changing the bits themselves.

To summarize, if you want this casting to work, you'll need to modify the internal representation of the variable. For example:

printf("%u\r\n", (uint32_t)*i_ptr);
ibezito
  • 5,782
  • 2
  • 22
  • 46
  • "defining a union, such that ... is not possible": this doesn't make sense. It would meand that no union is possible at all ! –  Jul 06 '17 at 07:01
  • @YvesDaoust agree, I rephrased it to be more accurate – ibezito Jul 06 '17 at 07:06