Arithmetic for two unions, undefined behavior?

Question

#include <stdio.h>
#include <stdlib.h>

typedef union
{

    double f;

    unsigned long long u;

    int long long i;
} r;

int main()
{
  r var1, var2;

  var1.f = -3.5;
  var2.u = 3;

  var1.u = var1.u + var2.u;

  printf("%u", var1.u);
  return 0;
}

Why is this returning only the value of var1 and not the summation? It works if var1 and var2 are added with the same assigned datatype. I thought union made that a non issue?

Should use `printf("%llu", var1.u);` to see all of `var1.u`. — chux - Reinstate Monica, Jan 16 '14 at 02:31
Technically, it's undefined behaviour to write to one field of a union, then read another field without having written to that field. Technically. EDIT: Didn't see Barmar's answer, sorry — A Person, Jan 16 '14 at 02:46
@APerson as I stated in my answer type-punning through unions has been legal since C89. — Shafik Yaghmour, Jan 16 '14 at 03:26
@chux actually that is the main problem, there is no undefined behavior b/c of punning. You should have posted an answer. — Shafik Yaghmour, Jan 16 '14 at 03:43
@Shafik Yaghmour Yes, but enough water had already passed under the bridge. Just don't type fast. — chux - Reinstate Monica, Jan 16 '14 at 03:48

score 3 · Answer 1 · edited May 23 '17 at 12:02

3

Reading from a different member of a union than the one that you last assigned results in an unspecified value. It's not invalid, but the standard doesn't specify how the type punning will be resolved, and the result might be a trap represnetation.. See:

Is type-punning through a union unspecified in C99, and has it become specified in C11?

The purpose of unions isn't to allow type punning. It allows you to save space, by reusing the same memory for two different variables when you know you'll never need them both at the same time. For an example where this is useful see:

How can a mixed data type (int, float, char, etc) be stored in an array?

(that happens to be my highest-voted answer).

edited May 23 '17 at 12:02

Community

1
1

answered Jan 16 '14 at 02:18

Barmar

741,623
53
500
612

That isn't always true in c. – this Jan 16 '14 at 02:22
@user2986109 Not at all - despite the strict rules, unions remain very useful, as long as you know what field you should read. One way to know is to store a flag that indicates what fields you wrote last. – Sergey Kalinichenko Jan 16 '14 at 02:25
I am trying to add the two values, I know it doesn't differentiate the ALU or FPU for the add op, but is there a way to add them regardless? – user2986109 Jan 16 '14 at 02:31
What two values are you trying to add? If you want to convert a `double` to `unsigned long long` you should use a cast, not a union. – Barmar Jan 16 '14 at 02:33
If you want to add different types without conversion, use take the address, cast it to the other pointer type, and indirect through that. – Barmar Jan 16 '14 at 02:34
Yes, without conversion, without flags if possible – user2986109 Jan 16 '14 at 02:35
The reason these things might not work is because different data types have different alignment requirements. – Barmar Jan 16 '14 at 02:37
As long as I declare vars aligned it should be okay though? – user2986109 Jan 16 '14 at 02:49
There are no guarantees, that's what undefined behavior means. – Barmar Jan 16 '14 at 02:53
There are some exotic computer architectures where integers and floats may be stored in different areas of memory, so type punning isn't possible. – Barmar Jan 16 '14 at 02:54
Actually, C probably couldn't work on such an architecture, because `malloc()` has to return memory that can be used for any type. But there are still some weird implementations where type punning won't do intuitive things. – Barmar Jan 16 '14 at 02:56
1

This is wrong as [Pascal Couq's quote here](http://stackoverflow.com/questions/2310483/purpose-of-unions-in-c-and-c#comment26826326_2313676) says this has been legal in C since C89. I used to believe the same until I answer [this](http://stackoverflow.com/questions/20922609/why-does-optimisation-kill-this-function/20956250#20956250) question and was promptly corrected. – Shafik Yaghmour Jan 16 '14 at 03:06
This should not be the accepted answer as its premise is false. Type punning via unions has never been undefined behavior in any C standard. In fact it has been considered everything but undefined -- implementation defined in C89, unspecified in C99, and now well defined in C11. – tab Jan 16 '14 at 04:17
I've updated my answer to state that the resulting value is unspecified. In the context of this question, that's sufficient to explain the misbehavior the OP is seeing. – Barmar Jan 16 '14 at 04:48

score 2 · Accepted Answer · edited May 23 '17 at 10:30

Type-punning through unions has been legal since C89 and so there is no undefined behavior there and several compilers explicitly guarantee it will work, for example see gcc documentation on type-punning. They need this because in C++ it is not as clear cut.

But this line sure does have undefined behavior:

printf("%u", var1.u);

The type of var1.u is unsigned long long and so the correct format specifier should be %llu, and clang duly complains as follows:

warning: format specifies type 'unsigned int' but the argument has type 'unsigned long long' [-Wformat]

printf("%u", var1.u);
        ~~   ^~~~~~
        %llu

Once you fix that the output I see is this (see it live):

13838435755002691587

which shows that the changes to both variables are having an effect.

The results you see are due to the format of IEEE 754 binary number which looks like this:

enter image description here

This is one of the several examples showing the hex representation of a number:

3ff0 0000 0000 0002₁₆ ≈ 1.0000000000000004

c000 0000 0000 0000₁₆ = –2

So in your case assigning a negative number to var1.f is going to set at least one high bit. We can explore this in C++ easily using std::bitset and gcc since they explicitly support type-punning via unions in C++:

#include <iostream>
#include <iomanip>
#include <bitset>
#include <string>

typedef union
{
    double f;
    unsigned long long u;
    int long long i;
} r;

int main() 
{
    r var1, var2;

    var1.f = -2 ; // High bits will be effected so we expect a large number for u
                  // Used -2 since we know what the bits should look like from the
                  // example in Wikipedia
    std::cout << var1.u << std::endl ;

    std::bitset<sizeof(double)*8> b1( var1.u ) ;
    std::bitset<sizeof(double)*8> b2( 13835058055282163712ull ) ;

    std::cout << b1 << std::endl ;
    std::cout << b2 << std::endl ;

    var2.u = 3;

    var1.u = var1.u + var2.u; // Low bits will be effected so we expect a fraction
                              // to appear in f

    std::cout << std::fixed << std::setprecision(17) <<  var1.f << std::endl ;

    std::bitset<sizeof(double)*8> b3( var1.u ) ;
    std::bitset<sizeof(double)*8> b4( 13835058055282163715ull ) ;

    std::cout << b3 << std::endl ;
    std::cout << b4 << std::endl ;

    return 0;
}

the results I see are (see it live):

13835058055282163712
1100000000000000000000000000000000000000000000000000000000000000
1100000000000000000000000000000000000000000000000000000000000000
-2.00000000000000133
1100000000000000000000000000000000000000000000000000000000000011
1100000000000000000000000000000000000000000000000000000000000011

I saw the same thing with the integers but the float printout was not the summation. http://coliru.stacked-crooked.com/a/0d53d3f4a06c3e53 — user2986109, Jan 16 '14 at 03:51
@user2986109 you had a typo, [fixed it](http://coliru.stacked-crooked.com/a/1dfe1ace6dbcc2f2) — Shafik Yaghmour, Jan 16 '14 at 04:20
@user2986109 What you're getting is the result of adding 3 to the IEEE 754 binary representation of -3.5, which at the default printf precision isn't enough to see (since it's very small): http://coliru.stacked-crooked.com/a/c817211fb8ca6ebd — tab, Jan 16 '14 at 04:39
@tab makes sense, I did not have enough time to think that over so I did not attempt to explain. I only answered b/c the current answers were not correct. — Shafik Yaghmour, Jan 16 '14 at 04:44
Shafik, what was the typo that you fixed? Is it the result passed to printf() or the change of addend var1.f, or both? — user2986109, Jan 16 '14 at 04:58
I think the answer is this: Assigning floats into the union and processing them as integer is default behavior. Assigning integers then processing as float is not, and implicitly generates a typecast. — user2986109, Jan 16 '14 at 05:45
@user2986109 you are adding integers but you are printing out a `double` in this case you are effecting the low bits which effects the fractional part of the IEEE number. If you change the `printf` in your code to `printf("%.17f", var1.f);` you will see the effect. — Shafik Yaghmour, Jan 16 '14 at 15:46
@user2986109 or the change I did which was to add to the `double` and set the `double` like this `var1.f = var1.f + var2.u;` obtains the results I think you were intending. — Shafik Yaghmour, Jan 16 '14 at 15:48
thanks for your answer shafik :) I am only using union to try to expose more operations to the (bit)data. Is there a way to use pointers and addresses to do the same thing? — user2986109, Jan 18 '14 at 04:37
@user2986109 what kind of operations? You probably should ask a new question though. — Shafik Yaghmour, Jan 19 '14 at 01:40

user3159253 · Answer 3 · 2014-01-16T02:40:28.100

Umm.

In a union fields reside on the same physical space. That is, the size of union is roughly the size of its biggest field. You assign to a float field of a union and then tries to use as integer value. This leads to an undefined behaviour, more precisely this behaviour depends on representation of integer and float numbers on a target platform.

Frankly speaking you may use this trick for certain kind of conversions (e.g. if you need to "convert" a pair of machine words to a single dword), but every time you use this technique you should clearly understand the gory details of target CPU architechture.

A friend of mine once got spurious segfaults on SPARC computers because he tried to access a non-aligned data using similar techniques :)

The example:

alex@galene ~/tmp $ cat test_union.c 
#include <stdio.h>

typedef union {
        float f;
        unsigned long long ull;
} csome;

int main(void) {
        csome cs;
        csome cs2;
        printf("&f = %p, &ull = %p\n", &cs.f, &cs.ull);
        cs.f = 3.5;
        cs2.ull = 3;
        cs2.ull = cs.ull + cs2.ull;
        printf("cs2.ull = %Ld\n", cs2.ull);
        return 0;
}
alex@galene ~/tmp $ cc -Wall -o test_union test_union.c
alex@galene ~/tmp $ ./test_union 
&f = 0xbfee4840, &ull = 0xbfee4840
cs2.ull = 1080033283

As you may see the value of cs2.ull is "random"

If assigning the integer any value doesn't properly translate to the float value, that doesn't matter, it should still add what is there right? — user2986109, Jan 16 '14 at 02:24
Well, if you really wish to understand what happens here, try to check _addresses_ of both f and u fields. Actually, I had a quick example (see the answer, comments isn't suitable for code :) ) — user3159253, Jan 16 '14 at 02:41

Arithmetic for two unions, undefined behavior?

3 Answers3