10

Consider this program:

#include <stdio.h>

union myUnion
{
    int x;
    long double y;
};

int main()
{
    union myUnion a;
    a.x = 5;
    a.y = 3.2;
    printf("%d\n%.2Lf", a.x, a.y);
    return 0;
}

Output:

-858993459
3.20

This is fine, as the int member gets interpreted using some of the bits of the long double member. However, the reverse doesn't really apply:

#include <stdio.h>

union myUnion
{
    int x;
    long double y;
};

int main()
{
    union myUnion a;
    a.y = 3.2;
    a.x = 5;
    printf("%d\n%.2Lf", a.x, a.y);
    return 0;
}

Output:

5
3.20

The question is why the long double doesn't get reinterpreted as some garbage value (since 4 of its bytes should represent the integer)? It is not a coincidence, the program outputs 3.20 for all values of a.x, not just 5.

timrau
  • 22,578
  • 4
  • 51
  • 64
DarkAtom
  • 2,589
  • 1
  • 11
  • 27
  • 2
    Isn't it all just undefined behavior? The only guarantee is that it'll behave as expected when the type used for the last store matches the type used for any preceding loads. – Brendan Sep 20 '19 at 17:55
  • 1
    What CPU are you targeting? – nicomp Sep 20 '19 at 17:56
  • That's true, the standard only guarantees that. I am trying to see what happens at bit level. – DarkAtom Sep 20 '19 at 17:56
  • 2
    @DarkAtom That's likely only affecting the last few bits of the mantissa. It probably won't make any noticeable difference. – S.S. Anne Sep 20 '19 at 17:58
  • Since you are using a long double, 10 bytes, and an int, 4 bytes (I assume) then the int only clobbers 4 bytes of the long double. Those 4 bytes are not part of what you're printing when you use the .2lf format specifier. – nicomp Sep 20 '19 at 18:00
  • @JL2210 I think you are correct. Printing with more decimal places will likely display the difference. – Michael Choi Sep 20 '19 at 18:02
  • @nicomp long double as 10 bytes? On what arch? – Michael Choi Sep 20 '19 at 18:03
  • @MichaelChoi x86. It is an 80-bit IEEE-754 extended long double format – S.S. Anne Sep 20 '19 at 18:03
  • 2
    @MichaelChoi Good question! I just searched for it and found this: https://www.tutorialspoint.com/cprogramming/c_data_types.htm. That's the semi-authoritative source I used in my comment. – nicomp Sep 20 '19 at 18:04

4 Answers4

8

However, the reverse doesn't really apply

On a little endian system (least significant byte of a multi-byte value is at the lowest address), the int will correspond to the least significant bits of the mantissa of the long double. You have to print that long double with a great deal of precision to see the effect of that int on those insignificant digits.

On a big endian system, like a Power PC box, things would be different: the int part would line up with the most significant part of the long double, overlapping with the sign bit, exponent and most significant mantissa bits. Thus changes in x would have drastic effects on the observed floating-point value, even if only a few significant digits are printed. However, for small values of x, the value appears to be zero.

On a PPC64 system, the following version of the program:

int main(void)
{
    union myUnion a;
    a.y = 3.2;
    int i;
    for (i = 0; i < 1000; i++) {
      a.x = i;
      printf("%d -- %.2Lf\n", a.x, a.y);
    }
    return 0;
}

prints nothing but

1 -- 0.0
2 -- 0.0
[...]
999 - 0.0

This is because we're creating an exponent field with all zeros, giving rise to values close to zero. However, the initial value 3.2 is completely clobbered; it doesn't just have its least significant bits ruffled.

Kaz
  • 55,781
  • 9
  • 100
  • 149
  • Building on this answer, change the format specifier in the printf for the long double. – nicomp Sep 20 '19 at 17:59
  • Do you actually have a PowerPC64? – S.S. Anne Sep 20 '19 at 18:19
  • @JL2210 Yes; I ran the above program. – Kaz Sep 20 '19 at 18:20
  • Cool! What operating system do you run on it? – S.S. Anne Sep 20 '19 at 18:21
  • Interesting analysis, but you fail to specify that this behavior is actually undefined – chqrlie Sep 20 '19 at 21:02
  • @chqrlie GCC defines it: https://gcc.gnu.org/onlinedocs/gcc-9.2.0/gcc/Optimize-Options.html#Type-punning and the documentation notes it as a common practice: *The practice of reading from a different union member than the one most recently written to (called “type-punning”) is common. Even with -fstrict-aliasing, type-punning is allowed, provided the memory is accessed through the union type.* I mostly don't waste my time on academics. – Kaz Sep 20 '19 at 21:41
  • @chqrlie I have only a copy of C99, where it was still *unspecified behavior*, not *undefined*. If you read between the lines, that's where the implementation is supposed to support type punning, if anywhere. – Kaz Sep 20 '19 at 21:44
  • @Kaz: indeed gcc explicitly allows type punning via unions, which seems to have made its way into recent versions of the C Standard, but the observed behavior is different from what gcc refers to: the OP is rereading the long double after modifying only part of its value via another type, something explicitly described as potentially undefined even in recent versions of the C Standard. – chqrlie Sep 20 '19 at 21:46
6

The size of long double is very large. To see the effect of modifying the x field on implementations where x lines up with the LSBs of the mantissa of y and other bits of union are not effected when modifying via x, you need to print the value with much higher precision.

Mohit Jain
  • 30,259
  • 8
  • 73
  • 100
  • 2
    In C this is *not necessarily* undefined. The accepted answer in the referenced question says this is allowed although it may be a trap representation. If it's not a trap representation, the result is well defined. – dbush Sep 20 '19 at 18:11
  • You linked to a C++ question. – S.S. Anne Sep 20 '19 at 18:16
  • @JL2210 accepted answer contains relevant quote from C11. – Mohit Jain Sep 20 '19 at 18:22
  • @dbush thanks for catching that. You are right. I fixed the answer. – Mohit Jain Sep 20 '19 at 18:26
  • @MohitJain: where does the accepted answer provide a relevant quote? None of them do. – chqrlie Sep 20 '19 at 21:05
  • @chqrlie This discussion roots from an older version of my answer. In short we are talking about accepted answer on https://stackoverflow.com/questions/11373203/accessing-inactive-union-member-and-undefined-behavior which quotes #6.5.2.3 from C11. – Mohit Jain Sep 21 '19 at 11:51
  • @MohitJain: OK. Very good reference, I had upvoted it along with its finely documented award winning answer. I will add a reference to my answer to this question. – chqrlie Sep 22 '19 at 08:42
4

This is only affecting the last half of the mantissa. It won't make any noticeable difference with the amount of digits you're printing. However, the difference can be seen when you print 64 digits.

This program will show the difference:

#include <stdio.h>
#include <string.h>
#include <ctype.h>

union myUnion
{
    int x;
    long double y;
};

int main()
{
    union myUnion a;
    a.y = 3.2;
    a.x = 5;
    printf("%d\n%.64Lf\n", a.x, a.y);
    a.y = 3.2;
    printf("%.64Lf\n", a.y);
    return 0;
}

My output:

5
3.1999999992549419413918193599855044340074528008699417114257812500
3.2000000000000001776356839400250464677810668945312500000000000000

Based on my knowledge of the 80-bit long double format, this overwrites half of the mantissa, which doesn't skew the result much, so this prints somewhat accurate results.

If you had done this in my program:

a.x = 0;

the result would've been:

0
3.1999999992549419403076171875000000000000000000000000000000000000
3.2000000000000001776356839400250464677810668945312500000000000000

which is only slightly different.

S.S. Anne
  • 15,171
  • 8
  • 38
  • 76
  • Interesting analysis, but you fail to specify that this behavior is actually undefined – chqrlie Sep 20 '19 at 21:01
  • @chqrlie Note the comment on the OP's question: *"That's true, the standard only guarantees that. I am trying to see what happens at bit level."*. I did what the OP asked, without the whole "undefined behavior" boilerplate that everyone else is giving. – S.S. Anne Sep 21 '19 at 03:08
  • Indeed this point is mentioned in the comments. But it does not cost much to include such a remark in the answer for casual readers to get a full answer without the need to scan all the comments. UV. – chqrlie Sep 21 '19 at 08:44
-1

Answers posted by Mohit Jain, Kaz and JL2210 provide good insight to explain your observations and investigate further, but be aware that the C Standard does not guarantee this behavior:

6.2.6 Representations of types 6.2.6.1 General

6   When a value is stored in an object of structure or union type, including in a member object, the bytes of the object representation that correspond to any padding bytes take unspecified values. The value of a structure or union object is never a trap representation, even though the value of a member of the structure or union object may be a trap representation.

7   When a value is stored in a member of an object of union type, the bytes of the object representation that do not correspond to that member but do correspond to other members take unspecified values.

As a consequence, the behavior described in the answers is not guaranteed as all the bytes of the long double y member could be modified by setting the int x member, including the bytes that are not part of the int. These bytes can take any value and the contents of y could even be a trap value, causing undefined behavior.

As commented by Kaz, gcc is more precise than the C Standard: the documentation notes it as a common practice: The practice of reading from a different union member than the one most recently written to (called type-punning) is common. Even with -fstrict-aliasing, type-punning is allowed, provided the memory is accessed through the union type. This practice is actually condoned in the C Standard since C11, as documented in this answer: https://stackoverflow.com/a/11996970/4593267 . Yet in my reading of this footnote there is still no guarantee about the bytes of y not part of x.

Community
  • 1
  • 1
chqrlie
  • 131,814
  • 10
  • 121
  • 189