6

I have come across a strange behavior on signed bit-fields:

#include <stdio.h>

struct S {
    long long a31 : 31;
    long long a32 : 32;
    long long a33 : 33;
    long long : 0;
    unsigned long long b31 : 31;
    unsigned long long b32 : 32;
    unsigned long long b33 : 33;
};

long long f31(struct S *p) { return p->a31 + p->b31; }
long long f32(struct S *p) { return p->a32 + p->b32; }
long long f33(struct S *p) { return p->a33 + p->b33; }

int main() {
    struct S s = { -2, -2, -2, 1, 1, 1 };
    long long a32 = -2;
    unsigned long long b32 = 1;
    printf("f31(&s)       => %lld\n", f31(&s));
    printf("f32(&s)       => %lld\n", f32(&s));
    printf("f33(&s)       => %lld\n", f33(&s));
    printf("s.a31 + s.b31 => %lld\n", s.a31 + s.b31);
    printf("s.a32 + s.b32 => %lld\n", s.a32 + s.b32);
    printf("s.a33 + s.b33 => %lld\n", s.a33 + s.b33);
    printf("  a32 +   b32 => %lld\n",   a32 +   b32);
    return 0;
}

Using Clang on OS/X, I get this output:

f31(&s)       => -1
f32(&s)       => 4294967295
f33(&s)       => -1
s.a31 + s.b31 => 4294967295
s.a32 + s.b32 => 4294967295
s.a33 + s.b33 => -1
  a32 +   b32 => -1

Using GCC on Linux, I get this:

f31(&s)       => -1
f32(&s)       => 4294967295
f33(&s)       => 8589934591
s.a31 + s.b31 => 4294967295
s.a32 + s.b32 => 4294967295
s.a33 + s.b33 => 8589934591
  a32 +   b32 => -1

The above output shows 3 types of inconsistencies:

  • different behavior for different compilers;
  • different behavior for different bit-field widths;
  • different behavior for inline expressions and equivalent expressions wrapped in a function.

The C Standard has this language:

6.7.2 Type specifiers

...

Each of the comma-separated multisets designates the same type, except that for bit-fields, it is implementation-defined whether the specifier int designates the same type as signed int or the same type as unsigned int.

Bit-fields are notoriously broken in many older compilers...
Is the behavior of Clang and GCC conformant or are these inconsistencies the result of one or more bugs?

Community
  • 1
  • 1
chqrlie
  • 131,814
  • 10
  • 121
  • 189
  • 1
    You might want to take a step back, and just print the value of the six bit fields. – user3386109 Nov 13 '19 at 22:57
  • 2
    The code also has two warnings about mismatched arguments and format specifiers in the `printf`s. Until those are fixed, the code has undefined behavior, and is therefore allowed to do anything. – user3386109 Nov 13 '19 at 23:04
  • 1
    [godbolt](https://godbolt.org/z/4PAAnE) `gcc` looks like is masking every calculation on `long long : 33` bit-fields with a `2<<33-1` mask before and after calculation. `clang` just sign-extends `a33` and uses `rax` to calculate it - `clang` doesn't mask it with `2<<33-1`. I don't know if this is correct - should be a `long long : 33` bitfield promoted to `long long` or could be promoted to some implementation supported "__uint33_t"` type. – KamilCuk Nov 14 '19 at 00:10
  • 3
    One issue that's confusing this is that you're ignoring the warnings about incompatible arguments for your format specifiers. You need to cast the results of your inline additions to `long long` to get consistent results in the 4th, 5th, and 6th `printf` calls, e.g. `(long long) (s.a31 + s.b31)` Fixing this gives consistent results for the function calls vs. the inline computations, at least with `gcc`. – Tom Karzes Nov 14 '19 at 00:31
  • The compiler behavior seems unintuitive to me. One thing I noticed is that, with `gcc`, `sizeof(s.a31 + 0)` and `sizeof(s.a32 + 0)` are both 4 on my system, but `sizeof(s.a33 + 0)` is 8. I would have thought they would all be unpacked into `long long` and have size 8, but apparently not. – Tom Karzes Nov 14 '19 at 00:44
  • 4
    * Do not use bit field types other than `signed int`, `unsigned`, `_Bool`. Anything else is trouble. "A bit-field shall have a type that is a qualified or unqualified version of _Bool, signed int, unsigned int, or some other implementation-defined type." C11 §6.7.2.1 5 – chux - Reinstate Monica Nov 14 '19 at 04:21
  • There is also likely implementation-defined behaviour in the `return` statement: the operand being returned may be an unsigned value out of range for `long long` – M.M Jul 24 '22 at 22:51

2 Answers2

2

Is the behavior of Clang and GCC conformant or are these inconsistencies the result of one or more bugs?

I think it's most likely the fault is in your code, tbh. According to 6.7.2.1p5:

A bit-field shall have a type that is a qualified or unqualified version of _Bool, signed int, unsigned int, or some other implementation-defined type.

There's no mention of long long here, so we can't necessarily treat this code as conformant to begin with. It seems that some compilers have documented support (for example, some ARM clang targets), whereas others are happy to let the behaviour be undefined (for example, gcc manuals don't appear to list long long in the category of "Allowable bit-field types other than _Bool, signed int, and unsigned int (C99 and C11 6.7.2.1)").

Furthermore, according to 6.3.1.1p2:

The following may be used in an expression wherever an int or unsigned int may be used:

  • An object or expression with an integer type (other than int or unsigned int) whose integer conversion rank is less than or equal to the rank of int and unsigned int.
  • A bit-field of type _Bool, int, signed int, or unsigned int.

If an int can represent all values of the original type (as restricted by the width, for a bit-field), the value is converted to an int; otherwise, it is converted to an unsigned int.

In other words, it isn't simply enough for the compiler to support these types of bit-fields, but also to have appropriate type conversions so that the expressions are converted properly. Specifically, this code looks utterly terrifying, because %lld tells printf to expect long long int, whereas I think you may only be passing an int (or unsigned, perhaps):

printf("s.a31 + s.b31 => %lld\n", s.a31 + s.b31);
printf("s.a32 + s.b32 => %lld\n", s.a32 + s.b32);
printf("s.a33 + s.b33 => %lld\n", s.a33 + s.b33);
printf("  a32 +   b32 => %lld\n",   a32 +   b32);

I figured I'd sign off quoting my expected result of this hairy looking code above:

If a conversion specification is invalid, the behavior is undefined.282) If any argument is not the correct type for the corresponding conversion specification, the behavior is undefined.

-- C11/7.21.6.1p9

autistic
  • 1
  • 3
  • 35
  • 80
  • The type of the bitfield is a Constraint, so if the compiler didn't define the behaviour of `long long` bitfields, a diagnostic message is compulsory. Assuming OP didn't see a diagnostic we could conclude that the implementation does define `long long` bitfields . Which means the implementation must document this behaviour else it's nonconforming . In particular, it's relevant how the compiler defines the integer conversion rank of these bitfields . – M.M Jul 24 '22 at 22:46
  • @M.M ok you can see the link to the gcc manual that doesn't document `long long` bitfields... Any implementation-defined behaviour must be documented here, but it isn't, so would you agree that the behaviour is probably undefined? – autistic Jul 31 '22 at 20:40
  • 1
    The behaviour is implementation-defined, and if the compiler does not document it, then the compiler is non-conforming . – M.M Jul 31 '22 at 21:18
0

Please have a look to the proposed code which works correctly and as expected.

For the practical purpose, I would suggest, just make sure that

  • compatible types are added,
  • correct types are returned and
  • correct types are in the printf statement.

That's it.

For more information, see also Ref.[1] and [2], below.

#include <stdio.h>

struct S {
    long long a31 : 31;
    long long a32 : 32;
    long long a33 : 33;
    
    unsigned long long b31 : 31;
    unsigned long long b32 : 32;
    unsigned long long b33 : 33;
};

long long f31(struct S *p) { return ((long long)p->a31 + (long long)p->b31); }
long long f32(struct S *p) { return ((long long)p->a32 + (long long)p->b32); }
long long f33(struct S *p) { return ((long long)p->a33 + (long long)p->b33); }

int main() {
    struct S s = { -2, -2, -2, 1, 1, 1 };
    long long a32 = -2;
    unsigned long long b32 = 1;
    
    printf("p->a31       => %lld\n", (long long)(s.a31));
    printf("p->a32       => %lld\n", (long long)(s.a32));
    printf("p->a33       => %lld\n", (long long)(s.a33));
    
    printf("p->b31       => %lld\n", (long long)(s.b31));
    printf("p->b32       => %lld\n", (long long)(s.b32));
    printf("p->b33       => %lld\n", (long long)(s.b33));
    
    
    printf("f31(&s)       => %lld\n", (long long)(f31(&s)));
    printf("f32(&s)       => %lld\n", (long long)(f32(&s)));
    printf("f33(&s)       => %lld\n", (long long)(f33(&s)));
    printf("s.a31 + s.b31 => %lld\n", ((long long)s.a31 + (long long)s.b31));
    printf("s.a32 + s.b32 => %lld\n", ((long long)s.a32 + (long long)s.b32));
    printf("s.a33 + s.b33 => %lld\n", ((long long)s.a33 + (long long)s.b33));
    printf("  a32 +   b32 => %lld\n", (long long) (a32 +   b32));
    return 0;
}

p->a31       => -2
p->a32       => -2
p->a33       => -2
p->b31       => 1
p->b32       => 1
p->b33       => 1
f31(&s)       => -1
f32(&s)       => -1
f33(&s)       => -1
s.a31 + s.b31 => -1
s.a32 + s.b32 => -1
s.a33 + s.b33 => -1
  a32 +   b32 => -1

References

[1] Signed to unsigned conversion in C - is it always safe?

[2] https://www.geeksforgeeks.org/bit-fields-c/ "We cannot have pointers to bit field members as they may not start at a byte boundary."

sidcoder
  • 460
  • 2
  • 6
  • 1
    "as expected" - by whom? The original code adds unsigned and signed variants of the same type, so OP may be expecting implicit conversion to unsigned for the result(e.g. as in `-1L + 2UL`). Your suggested takes away that aspect of the question. – M.M Jul 24 '22 at 22:49