0

I don't understand what happens when casting a bitfield.

Let's say we have this union and an example:

union {
  unsigned char data;
  int d : 3;
  unsigned char m : 3;
}x;

int main() {
  x.data = 182;
  // 182 (binary) -> 1 0 1 1 0 1 1 0
  printf("sizeof(x) = %lu\n", sizeof(x));
  printf("x.data = %d\n", x.data);
  printf("x.d = %d\n", x.d);
  printf("x.m = %d\n", x.m);
  printf("(unsigned char)x.d = %d\n", (unsigned char)x.d);
  printf("(signed char)x.d = %d\n", (signed char)x.d);
  printf("(signed char)x.m = %d\n", (signed char)x.m);
  printf("(unsigned char)x.m = %d\n", (unsigned char)x.m);
  return 0;
}

This is the output:

/*
sizeof(x) = 4
x.data = 182
x.d = -2
x.m = 6
(unsigned char)x.d = 254 //?
(signed char)x.d = -2    //?
(signed char)x.m = 6     //?
(unsigned char)x.m = 6   //?
*/

Now, I understand x.data, x.d, and x.m output, but what i don't understand is the result we get when casting.

What does happen in memory when casting? Why do we get these results:

  • (unsigned char)x.d = 254
  • (signed char)x.d = -2
  • (signed char)x.m = 6
  • (unsigned char)x.m = 6

EDIT: What I do not understand is how is this handled in memory, and which parts are read when casting. I put 182 in x.data which in binary is 10110110. x.data, x.m and x.d give expected results to me, but why does for example (unsigned char)x.d return 254? Why doesn't it return 182 since x.data and x.d are in the same memory location and i casted x.d to be unsigned char which is same type as x.data is.

dante
  • 11
  • 2
  • Declaring a bitfield within a struct with only a single range of bits is rather pointless, (e.g. `int d : 3;`, could simply be `int d;`) A bitfield allows designation of *multiple* ranges within a single *unsigned* type. – David C. Rankin Mar 16 '16 at 13:21
  • @DavidC.Rankin I know there is no point in doing something like this but I am practicing for a competition where things that don't make much sense show up. – dante Mar 16 '16 at 16:48

2 Answers2

0

The behaviour on setting a union via one member and reading back a different union member is undefined in C.

It's therefore pointless trying to analyse the behaviour in this specific case.

You ought to rebuild using a solution based around a memcpy: at least then the output will be analysable.

Also, note that %zu is an appropriate format specifier for a sizeof value: currently the printf behaviour is also undefined due to your using %lu.

Bathsheba
  • 231,907
  • 34
  • 361
  • 483
  • 1
    Undefined doesn't equal pointless. It's undefined behaviour because it's architecture dependant. (mostly because of endianess and memory alignment). On x86 it's perfectly predictable, can be and is used. – xvan Mar 16 '16 at 13:19
  • @xvan: IMHO your helpful comment does have a certain legitimacy: citing the undefinedness of union read and write always attracts attention. But it's the role of the compiler to decide what to do, not the architecture. Seriously, don't risk it. The `memcpy` approach is no more complex and is the correct thing to do. – Bathsheba Mar 16 '16 at 13:20
  • @xvan I disagree. I learned this lesson the hard way when I ported some code from one compiler to a different compiler on the same machine and got different behavior. The issue turned out to be that the two compilers took different approaches to aligning fields within unions and structures. Both were entirely valid. – Logicrat Mar 16 '16 at 13:22
  • 1
    Please provide a citation for the first paragraph. The latest standard allows type punning with certain caveats. – 2501 Mar 16 '16 at 13:23
  • @Logicrat That shouldn't happen under C99 section 6.7.2.1 [...] A pointer to a union object, suitably converted, points to each of its members (or if a member is a bitfield, then to the unit in which it resides), and vice versa. – xvan Mar 16 '16 at 13:33
  • This accepted answer http://stackoverflow.com/questions/11373203/accessing-inactive-union-member-undefined-behavior seems informative. Note the question is tagged as C++ but the answer covers C too. – Bathsheba Mar 16 '16 at 13:38
0

x.d and x.m show no change by casting, which is to be expected. x.d has the same bit pattern in either case but is interpreted as -2 when it is considered signed, and 254 when those same bits are interpreted as unsigned.

The fact that these variables are grouped in a union will save space if you read and write one variable within the union, but is undefined behavior when you write one variable and read another.

Logicrat
  • 4,438
  • 16
  • 22
  • 1
    Plus one for your last sentence in particular. – Bathsheba Mar 16 '16 at 13:17
  • 1
    Please provide a citation for the second paragraph. The latest standard allows type punning with certain caveats. – 2501 Mar 16 '16 at 13:22
  • 2
    its not _undefined_, its _unspecified_ (mentioned in annex J that deals with portabilty issues). _Unspecified_ is later explained as _implementation defined_ (which is something completely different). – mfro Mar 16 '16 at 13:42
  • What I do not understand is how is this handled in memory, and which parts are read when casting. I put 182 in x.data which in binary is 10110110. x.data, x.m and x.d give expected results to me, but why does for example (unsigned char)x.d return 254? Why doesn't it return 182 since x.data and x.d are in the same memory location and i casted x.d to be unsigned char which is same type as x.data is. – dante Mar 16 '16 at 17:01
  • @dante Because you have explicitly stated that both `d` and `m` are bitfields with only 3 bits. Therefore, your data in those fields is only the bits '110'. That can be interpreted as either 6 or -2 depending on whether it's signed. And if you cast it to (unsigned char), it will first be sign-extended, and then reported as an unsigned char 11111110, which in decimal is 254. – Logicrat Mar 16 '16 at 17:58
  • @Logicrat Just one more thing, why does (unsigned char)x.m return 6, we have '110' so shouldn't it be extended as '11111110' since the first bit is 1? As well as (signed char)x.m = 6? Does this mean that sign extension isn't done by looking the first bit, rather the compiler sees that originally x.m is an unsigned char so it extends it with all zeroes? So we have 00000110 in both cases? If so could you update your answer with this info and I will accept this. – dante Mar 16 '16 at 18:19
  • @dante `m` is declared as `unsigned`, so it doesn't get sign-extended. `d` is signed, so it does. – Logicrat Mar 16 '16 at 20:16