0

I know violating the strict-aliasing rule is Undefined Behavior as per the C standard. Please don't tell me it is UB and there is nothing to talk about.

I'd like to know if there are compilers which won't have the expected behavior (defined by me below) for the following code.

Assume the size of float and int is 4 bytes, and a big-endian machine.

float f = 1234.567;  /* Any value here */
unsigned int u = *(unsigned int *)&f;

My expected behavior in english words is "get the four bytes where the float is stored and put them in an int as is". In code it would be this (I think there is no UB here):

float f = 1234.567;  /* Any value here */
unsigned char *p = (unsigned char *)&f;
unsigned int u = (p[0] << 24) | (p[1] << 16) | (p[2] << 8) | p[3];

I'd also welcome practical and concrete examples of why, apart from being UB as per the standard, a compiler would have what I consider an unexpected behavior.

atturri
  • 1,133
  • 7
  • 13
  • 5
    What is your question? Are you trying to define the behaviour of _undefined behaviour_? Your very first sentence already states it clear. Note also your shifts invoke UB for certain values, too. – too honest for this site Apr 22 '16 at 21:13
  • Some compilers have a *defined* behavior on certain c UB cases. I'd like to see an example of a compiler having a behavior different from my expected one in this case. – atturri Apr 22 '16 at 21:15
  • http://stackoverflow.com/questions/20762952/most-efficient-standard-compliant-way-of-reinterpreting-int-as-float – fukanchik Apr 22 '16 at 21:15
  • 8
    You should ask your compiler vendor. – Kerrek SB Apr 22 '16 at 21:16
  • 2
    @fukanchik: C and C++ are **different** languages! – too honest for this site Apr 22 '16 at 21:16
  • 1
    @Olaf are you saying `memcpy` wouldn't be the right answer? – fukanchik Apr 22 '16 at 21:18
  • @fukanchik it is a good read but no it doesn't answer the question. I'm asking about this exact code, just curiosity – atturri Apr 22 '16 at 21:19
  • Why don't you simply try it out (often enough, so you can be very sure it *always* behaves like you think it should and did the first time :) ) – tofro Apr 22 '16 at 21:28
  • I have it tested thoroughly in some compilers, with my expected behavior. But the wise people here might have counterexamples, or even reasons for having another behavior. – atturri Apr 22 '16 at 21:31
  • 1
    Jokes apart: Most (the older, the more) will behave like you expect. Some will not, some will cease to do that in their next minor release. What good would that knowledge be for? What good could any code be that *relies* on such stuff? – tofro Apr 22 '16 at 21:34
  • 2
    I went down the strict aliasing rabbit hole not very long ago and found this. It gets into the assembly and shows why UB can happen from violating strict aliasing .. if that's what you're asking: http://dbp-consulting.com/tutorials/StrictAliasing.html – yano Apr 22 '16 at 21:37

3 Answers3

8

On most compilers, it will do what you expect until the optimizer decides to dead-code eliminate or move the assignment to f.

This makes it essentially impossible to test if any given compiler will always do what you expect -- it might work for one particular program, but then a slightly different one might fail. The strict-aliasing rule is basically just telling the compiler implementor "you can rearrange and eliminate these things fairly freely by assuming they never alias". When it's not useful to do things that would cause this code to fail, the optimizer probably won't, so you won't see a problem.

Bottom line is that it is not useful to talk about "which compilers this will somtimes work on", as it might suddenly stop working in the future on any of them if something seemingly unrelated changes.

Chris Dodd
  • 119,907
  • 13
  • 134
  • 226
1
float f = 1234.567;  /* Any value here */
unsigned int u = *(unsigned int *)&f;

Some plausible reasons why this would not work as expected are:

  1. float and unsigned int are not the same size. (I've worked on systems where int is 64 bits and float is 32 bits. I've also worked on systems where both int and float are 64 bits, so your assumption that 4 bytes are copied would fail.)

  2. float and unsigned int have different alignment requirements. Specifically, if unsigned int requires stricter alignment than float, and f happens to to be strictly aligned, reading f as if it were an unsigned int could do Bad Things. (This is probably unlikely if int and float are the same size.)

  3. A compiler might recognize that the code's behavior is undefined, and for example optimize away the assignment. (I don't have a concrete example of this.)

If you want to copy the representation of a float into an unsigned int, memcpy() is safer (and I'd first check that they actually have the same size). If you want to examine the representation of a float object, the canonical way to do that is to copy it to an array of unsigned char. Quoting the ISO C standard (6.2.6.1p4 in the N1570 draft):

Values stored in non-bit-field objects of any other object type consist of n × CHAR_BIT bits, where n is the size of an object of that type, in bytes. The value may be copied into an object of type unsigned char [ n ] (e.g., by memcpy); the resulting set of bytes is called the object representation of the value.

Keith Thompson
  • 254,901
  • 44
  • 429
  • 631
  • I explicitly asked for an assumption for both being 4 bytes long, and was also thinking about them being equally aligned. But good point on 3, and interesting reference to the standard. Thanks. – atturri Apr 23 '16 at 08:01
0

You're invoking undefined behavior for no reason at all.

Will this strict-aliasing rule violation have the behavior I expect?

No. And you don't need to expect anything, because you can write much better looking code.

This has defined behavior that you'd like:

union {
  float f;
  uint32_t i;
} ufi_t;
assert(sizeof(float) == sizeof(uint32_t);

ufi_t u = { 123.456 };
uint32_t i = u.i;

You can factor it out, decent compilers will generate no code for it:

inline uint32_t int_from_float(float f) {
  ufi_t u = { f };
  return u.i;
}

You can also cast from (*float) to (*ufi_t) safely. So:

float f = 123.456;
uint32_t i = ((ufi_t*)&f)->i;

Note: language lawyers are welcome to set me straight on this last one, but that's what I make of C9899:201x 6.5 and so on.

Kuba hasn't forgotten Monica
  • 95,931
  • 16
  • 151
  • 313
  • Nothing in the Standard would forbid an implementation from imposing an alignment requirement on union types which is coarser than the alignment requirement for any member contained therein. – supercat Jan 09 '17 at 17:12
  • @supercat Yep. And by golly would a lot of code break if someone did that :/ – Kuba hasn't forgotten Monica Jan 09 '17 at 17:13
  • Given that the `union` trick isn't going to be reliable in the general case, compiler writers could easily regard the permission to use pointers of struct and union types as only allowing objects *with no declared type* to be created by writing their constituents. If some storage with no declared type is written with an `int`, the rule would allow a compiler to retroactively regard its effective type as being a union containing that `int`. Such allowance wouldn't work, though, if the object had a *declared* type of `int`. – supercat Jan 09 '17 at 17:30