Output of union using if statement

Question

Hey I have a homework problem on this c code:

#include<stdio.h>

typedef union{
    char var1;
    int var2;
    float var3;
}data;

int main()
{
    data mydata;

    mydata.var1 = 'B';
    mydata.var2 = 12;

    if(mydata.var1 == 'B')
        mydata.var3 = 3.5;
    else
        mydata.var3 = 7.1;

    printf("%.1f", mydata.var3);
    return 0;
}

The output is 7.1, i was wondering if someone could explain why the output is 7.1 and not 3.5.

Cheers for your help.

This code has undefined behavior. You're only allowed to read the member of a union that you wrote last. You can't write to `mydata.var2` and then read `mydata.var1`. — Barmar, Jun 13 '19 at 02:37
It seems like you didn't really understand the part of the lesson that explained the difference between `union` and `struct`. Go back to your textbook and study some more. — Barmar, Jun 13 '19 at 02:38
The key point is that all the union members share the same memory. When you do `mydata.var2 = 12;`, you overwrite the memory used for `mydata.var1`, so it no longer contains `'B'`. — Barmar, Jun 13 '19 at 02:41
Well, despite having undefined behavior in C, this is common practice in embedded systems and I believe supported as a gcc extension, and is commonly used to serialize data, such as packed structs, into a byte array in order to transmit over serial or other communication interfaces. It's also a common means of type punning, or casting from one type to another, in order to see how variables are really stored at the byte level. Note than both serialization and type punning can also be achieved using pointer casts and dereferencing as an alternative to using unions. — Gabriel Staples, Jun 13 '19 at 02:42
@GabrielStaples Embedded code often uses undefined behavior, they just depend on a specific implementation. — Barmar, Jun 13 '19 at 02:46
Pretty much any form of type punning is technically undefined behavior. — Barmar, Jun 13 '19 at 02:47
@GabrielStaples - the C of today is no longer the C of yesteryear. BITD (Back In The Day (tm)) rule-obsessed programmers used Pascal, and C programmers laughed at them. Now it seems that Pascal won. Woe! Woe unto us all! :-) — Bob Jarvis - Слава Україні, Jun 13 '19 at 02:49
@GabrielStaples there is a distinction to be made. Many embedded systems are **freestanding environments** and by definition are *implementation defined*, see [C11 Standard - 5.1.2.1 Freestanding environment](http://port70.net/~nsz/c/c11/n1570.html#5.1.2.1) So what may be common and defined in a freestanding environment my be undefined in a hosted environment. — David C. Rankin, Jun 13 '19 at 02:53
@Barmar I don't know of anywhere in the C Standard that says this is undefined behavior. A footnote in 6.5.2.3 (describing the `.` and `->` operators) says "If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6... This might be a trap representation." I think it's unspecified behavior. — aschepler, Jun 13 '19 at 03:20
@Barmar: Reading a union member other than the last one written to is not undefined behavior in C. The bytes are reinterpreted as the accessed type. This results in behavior that depends on certain aspects of the implementation, but it is not undefined behavior as that term is defined by the C standard. — Eric Postpischil, Jun 13 '19 at 03:32
@aschepler: I do not think it is unspecified behavior. That term applies to things like the values of uninitialized objects: They may have any valued, and it is unspecified which value any instance has. The reinterpretation of bytes as a new type is, aside from issues about contents of padding, a property of the C implementation; it should not vary from instance to instance in an implementation. — Eric Postpischil, Jun 13 '19 at 03:46
OK, everyone, it's implementation-defined, not undefined. See https://stackoverflow.com/questions/52290456/is-the-following-c-union-access-pattern-undefined-behavior — Barmar, Jun 13 '19 at 07:18
Yeah but how bout the fact that one member differs in size from the other 2? You can set the other 2 members to some value that the other member cannot represent... — bigwillydos, Jun 13 '19 at 21:03
Thinking about it further, you can set the `var3` value to something that neither `var1` or `var2` can represent... — bigwillydos, Jun 13 '19 at 21:11

score 4 · Accepted Answer · answered Jun 13 '19 at 03:04

First thing you should know is that union allocates one common storage space for all its members. We can access only one member of the union at a time.

In your example, 'B' is assigned to union member "mydata.var1". The memory location name is mydata.var1 and the value stored in this location is 'B'. Then 12 is assigned to union member "mydata.var2". Now memory location name is "mydata.var2" with the value 12. (union can hold only one member at a time). This is the reason why your else part is called in your program. If you want to print 3.5 instead of 7.1 then you should use struct instead of the union because the structure allocates space for all the members separately.

typedef struct{
    char var1;
    int var2;
    float var3;
}data;

int main()
{
    data mydata;

    mydata.var1 = 'B';
    mydata.var2 = 12;

    if(mydata.var1 == 'B')
        mydata.var3 = 3.5;
    else
        mydata.var3 = 7.1;

    printf("%.1f", mydata.var3);
    return 0;
}

Hope this will help you.

score 0 · Answer 2 · answered Jun 14 '19 at 17:01

The output is 7.1, i was wondering if someone could explain why the output is 7.1 and not 3.5.

The members of a union all share the same memory address. When you set the value of any member of a union, then it will modify the value at that memory address, and since all members are mapped to this address, when you read the value then it'll reflect the last value that was written to that address.

With this line:

mydata.var1 = 'B';

You set the value of that memory to 0x42 then with the next line:

mydata.var2 = 12;

You set the value of that memory to 12 so that when you get here:

if(mydata.var1 == 'B')
    mydata.var3 = 3.5;
else
    mydata.var3 = 7.1;

The else clause is executed and now the value of that memory is set to 7.1.

Here's the bigger problem with your code: the use of a union between a char, int, and float doesn't really make sense

The reason is relatively straight forward: the memory size needed for the members is different.

The memory that is needed for a char is 1 byte as it is the smallest addressable unit for the machine the code is running on. The memory that is needed for a int (aka signed int) is at least 2 bytes or 16 bits, but on most machines these days is 4 bytes or 32 bits. The memory needed for a float on most machines is 4 bytes or 32 bits because of IEEE 754 single-precision binary floating-point format. The values that the members themselves can represent is also completely different. A signed char is usually [-128, 127]. A signed int, assuming 32 bits, is [−2,147,483,647, +2,147,483,647], and a float is [1.2 * 10^-38, 3.4 * 10^38]. However, at least a float and int are likely to have the same size so a union containing these makes more sense.

I understand this is probably an educational or beginner exercise, however it is an exercise that has failed to highlight the purpose of a union and the correct use of one.

The way a union is used is to interpret the same data in different ways.

A common example is how many networking APIs will define a union for IPv4 addresses

union ipv4addr {
  unsigned  address;
  char      octets[4];
};

This allows flexibility when passing this information to a function. Perhaps a particular function only cares about the 32-bit value while another cares only about specific bytes in that 32-bit value.

I recommend you read this answer and this one too for more information on why a union is useful and how it is correctly applied.

Output of union using if statement

2 Answers2