81

Why does this:

#include <stdio.h>
#include <limits.h>
#include <inttypes.h>

int main() {
    enum en_e {
        en_e_foo,
        en_e_bar = UINT64_MAX,
    };
    enum en_e e = en_e_foo;
    printf("%zu\n", sizeof en_e_foo);
    printf("%zu\n", sizeof en_e_bar);
    printf("%zu\n", sizeof e);
}

print 4 8 8 in C and 8 8 8 in C++ (on a platform with 4 byte ints)?

I was under the impression that the UINT64_MAX assignment would force all the enumerations constants to at least 64 bits, but en_e_foo remains at 32 in plain C.

What is the rationale for the discrepancy?

chqrlie
  • 131,814
  • 10
  • 121
  • 189
Petr Skocik
  • 58,047
  • 6
  • 95
  • 142
  • 1
    Which compilers? I don't know if it makes a difference, but it might. – Mark Ransom Jan 24 '17 at 18:48
  • @MarkRansom It came up with gcc but clang behaves the same. – Petr Skocik Jan 24 '17 at 18:49
  • [Live example of C](http://ideone.com/Gxj4uR) – Drew Dormann Jan 24 '17 at 19:00
  • 3
    _"on a platform with 4 byte ints"_ It's not just the platform, but the compiler that determines type widths. That may be all this is. (Per Keith's answer, it's actually not, but be aware of such possibilities in general) – Lightness Races in Orbit Jan 25 '17 at 10:30
  • TBH, when tagged this C/C++, I expected a torrent of dowvotes, not this upvote spam... SO has changed. – Petr Skocik Jan 31 '17 at 22:19
  • Did you want to accept one of the answers? – Keith Thompson Feb 02 '17 at 00:23
  • 1
    @PSkocik: Not really a change, just that this question found a valid use of both [tag:c] and [tag:c++] (asking why certain code causes different behavior between the two). Also ok: asking how to call C libraries from C++, and how to write C++ that can be called from C. Very not ok: asking a C question and throwing a C++ tag on "so it gets more eyeballs". Also not ok: asking a C++ question and as an afterthought "make sure you answer for C as well". (and for the usual complainers -- very not ok: changing a C++ tag to a C tag because the code uses functions that exist in both standards) – Ben Voigt Jan 03 '19 at 17:42

6 Answers6

80

In C, an enum constant is of type int. In C++, it's of the enumerated type.

enum en_e{
    en_e_foo,
    en_e_bar=UINT64_MAX,
};

In C, this is a constraint violation, requiring a diagnostic (if UINT64_MAX exceeds INT_MAX, which it very probably does). A C compiler may reject the program altogether, or it may print a warning and then generate an executable whose behavior is undefined. (It's not 100% clear that a program that violates a constraint necessarily has undefined behavior, but in this case the standard doesn't say what the behavior is, so that's still undefined behavior.)

gcc 6.2 doesn't warn about this. clang does. This is a bug in gcc; it incorrectly inhibits some diagnostic messages when macros from standard headers are used. Thanks to Grzegorz Szpetkowski for locating the bug report: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71613

In C++, each enumeration type has an underlying type, which is some integer type (not necessarily int). This underlying type must be able to represent all the constant values. So in this case, both en_e_foo and en_e_bar are of type en_e, which must be at least 64 bits wide, even if int is narrower.

Keith Thompson
  • 254,901
  • 44
  • 429
  • 631
  • 10
    quick note: for `UINT64_MAX` not to exceed `INT_MAX` requires that `int` is at least 65 bits. – Ben Voigt Jan 24 '17 at 19:38
  • 10
    The really strange thing is that gcc (5.3.1) emits a warning with `-Wpedantic` and `18446744073709551615ULL` but not with `UINT64_MAX`. – nwellnhof Jan 24 '17 at 19:51
  • @BenVoigt Or very bad naming – Bwmat Jan 24 '17 at 20:14
  • @BenVoigt 64 bits actually - it's "not to exceed" so they can be equal, and `UINT64_MAX` fits in a `uint64_t` – dascandy Jan 24 '17 at 20:49
  • 4
    @dascandy: No, `int` must be a signed type, so it would have to be at least 65 bits to be able to represnt `UINT64_MAX` (2**64-1). – Keith Thompson Jan 24 '17 at 20:50
  • @dascandy: The whole point of my comment was that `UINT64_MAX` can never ever fit in 64-bit `int`. – Ben Voigt Jan 24 '17 at 20:51
  • It also seems that enums wider than `int` are an undocumented GCC extension. – nwellnhof Jan 24 '17 at 21:39
  • @nwellnhof: Perhaps, but the fact that different enumeration constants of the same type can have different sizes is rather odd. – Keith Thompson Jan 24 '17 at 21:56
  • 1
    @KeithThompson, 6.7.2.2 says that "the identifiers in an enumerator list are declared as constants that have type int and may appear wherever such are permitted." My understanding is that the constants that a single C enum declares do not use the enum's type, so from there it's not a large stretch to make them different types (especially if it is implemented as an extension to the standard). – zneak Jan 24 '17 at 22:43
  • Is is not _unspecified behaviour_ in this case? – YSC Jan 25 '17 at 12:02
  • 1
    @zneak Allowing an enum's constants to be larger than the enum `int` type itself violates 6.7.2.2: "declared as constants that have type `int`" is explicit - the constants are `int` values, just like the enum itself. `en_e_bar` being larger than the enum `enum en_e` itself is *badly* broken. Imagine `send( sock, &en_e_bar, sizeof( en_e_bar ), 0 )` being sent to `enum en_e; recv( sock, &en_e, sizeof( en_e ), 0 );` – Andrew Henle Jan 25 '17 at 12:08
  • 2
    @AndrewHenle: `en_e_bar` is not bigger than the enum, `en_e_foo` is smaller. The enum variable was as larger as the largest constant. – Ben Voigt Jan 25 '17 at 14:59
  • @BenVoigt *`en_e_bar` is not bigger than the enum, `en_e_foo` is smaller.* That's even *more* broken. Per the C standard, the `enum` shall be an `int`. – Andrew Henle Jan 25 '17 at 21:23
25

That code just isn't valid C in the first place.

Section 6.7.2.2 in both C99 and C11 says that:

Constraints:

The expression that defines the value of an enumeration constant shall be an integer constant expression that has a value representable as an int.

A compiler diagnostic is mandatory because it is a constraint violation, see 5.1.1.3:

A conforming implementation shall produce at least one diagnostic message (identified in an implementation-defined manner) if a preprocessing translation unit or translation unit contains a violation of any syntax rule or constraint, even if the behavior is also explicitly specified as undefined or implementation-defined.

Community
  • 1
  • 1
Ben Voigt
  • 277,958
  • 43
  • 419
  • 720
23

In C, while a enum is considered to be a separate type, enumerators itself always have type int.

C11 - 6.7.2.2 Enumeration specifiers

3 The identifiers in an enumerator list are declared as constants that have type int...

Thus, behaviour you see is a compiler extension.

I'd say it makes sense to only expand size of one of the enumerators if its value is too large.


On the other hand, in C++ all enumerators have the type of the enum they're declared in.

Because of that, size of every enumerator must be same. So, size of entire enum is expanded to store the largest enumerator.

Community
  • 1
  • 1
HolyBlackCat
  • 78,603
  • 9
  • 131
  • 207
16

As others pointed, the code is ill-formed (in C), because of constraint violation.

There is GCC bug #71613 (reported June 2016), which states that some useful warnings are silenced with macros.

Useful warnings seem to be silenced when macros from system headers are used. For example, in the example below a warning would be useful for both enums but only one warning is shown. The same can probably happen for other warnings.

The current workaround may be to prepend the macro with unary + operator:

enum en_e {
   en_e_foo,
   en_e_bar = +UINT64_MAX,
};

which yields compilation error on my machine with GCC 4.9.2:

$ gcc -std=c11 -pedantic-errors -Wall main.c 
main.c: In function ‘main’:
main.c:9:20: error: ISO C restricts enumerator values to range of ‘int’ [-Wpedantic]
         en_e_bar = +UINT64_MAX
Grzegorz Szpetkowski
  • 36,988
  • 6
  • 90
  • 137
12

C11 - 6.7.2.2/2

The expression that defines the value of an enumeration constant shall be an integer constant expression that has a value representable as an int.

en_e_bar=UINT64_MAX is a constraint violation and this makes the above code invalid. A diagnostic message should be produce by confirming implementation as stated in the C11 draft:

A conforming implementation shall produce at least one diagnostic message (identified in an implementation-defined manner) if a preprocessing translation unit or translation unit contains a violation of any syntax rule or constraint, [...]

It seems that GCC has some bug and it failed to produce the diagnostic message. (Bug is pointed in the answer by Grzegorz Szpetkowski

Community
  • 1
  • 1
haccks
  • 104,019
  • 25
  • 176
  • 264
  • This doesn't explain why `sizeof en_e_foo != sizeof en_e_bar` – Ben Voigt Jan 24 '17 at 19:06
  • 1
    @BenVoigt; What do you expect when there comes undefined behavior? – haccks Jan 24 '17 at 19:07
  • 8
    "undefined behavior" is a runtime effect. `sizeof` is a compile-time operator. There's no UB here, and even if there were, it couldn't affect `sizeof`. – Ben Voigt Jan 24 '17 at 19:07
  • @haccks, `sizeof` evaluates at compile-time by necessity. `int foo[sizeof(en_e_bar)]` is a legal thing to write and doesn't result in an array of runtime-defined bounds. – zneak Jan 24 '17 at 19:10
  • @haccks: So you think the value is not `4`, it just prints that way due to undefined behavior? [The compiler disagrees.](http://rextester.com/YXFY47602) – Ben Voigt Jan 24 '17 at 19:11
  • I think there are lots of possibilities ... maybe an optimizer that replaces en_e_foo by 0 an en_e_bar by UINT64_MAX and runs before `sizeof` is evaluated? – Ingo Leonhardt Jan 24 '17 at 19:13
  • 2
    You should find the standard quote that enumerants that can't fit in an int are UB. I am highly skeptical of that statement and my vote will stay a solid -1 until this is cleared up. – zneak Jan 24 '17 at 19:15
  • 1
    @haccks: Even if there were UB here (there isn't, definition of an enumerator value is different from a variable initialization or assignment), UB can only cause arbitrary behavior at runtime. It can cause translation failure, but not arbitrarily. It's right there in your quote: "to behaving during translation or program execution **in a documented manner**" – Ben Voigt Jan 24 '17 at 19:15
  • @zneak: No, UB does not. It allows a compiler to do whatever is documented, or generate code that does whatever it likes. But the behavior at compile-time cannot be arbitrary. – Ben Voigt Jan 24 '17 at 19:16
  • 1
    Narrowing assignments results in conversion, not undefined overflows (http://port70.net/~nsz/c/c11/n1570.html#6.5.16.1p2) but this may well be UB after all because of http://port70.net/~nsz/c/c11/n1570.html#6.7.2.2p2 – Petr Skocik Jan 24 '17 at 19:17
  • It is also incorrect to say 'An enum can hold value only upto int size.' enum can hold up to any integral size, the size of the enum determined by the size of the initializers. – SergeyA Jan 24 '17 at 19:19
  • 3
    @Sergey: The C standard actually does say "The expression that defines the value of an enumeration constant shall be an integer constant expression that has a value representable as an int." but violating this would be a constraint violation, diagnostic required, not UB. – Ben Voigt Jan 24 '17 at 19:20
  • @BenVoigt; *If a ‘‘shall’’ or ‘‘shall not’’ requirement that appears outside of a constraint or runtime-constraint is violated, the behavior is undefined.* – haccks Jan 24 '17 at 19:23
  • 3
    @haccks: Yes? It's a constraint violation, and "A conforming implementation shall produce at least one diagnostic message (identified in an implementation-defined manner) if a preprocessing translation unit or translation unit contains a violation of any syntax rule or constraint, even if the behavior is also explicitly specified as undefined or implementation-defined." – Ben Voigt Jan 24 '17 at 19:24
  • 2
    There's a difference between overflow and truncation. Overflow is when you have an arithmetic operation that produces a value too large for the expected result type, and signed overflow is UB. Truncation is when you have a value that was too big for the target type to begin with (like `short s = 0xdeadbeef`), and the behavior is implementation-defined. – zneak Jan 24 '17 at 19:25
  • @zneak; Yeah. Made a mistake. – haccks Jan 24 '17 at 19:35
  • @BenVoigt, may be C, but not C++: ... type whose underlying type is not fixed... if not all enumerator values can be represented as int, an implementation-defined larger integral type that can represent all enumerator values. http://en.cppreference.com/w/cpp/language/enum – SergeyA Jan 24 '17 at 19:58
5

I took a look at the standards and my program appears to be a constraint violation in C because of 6.7.2.2p2:

Constraints: The expression that defines the value of an enumeration constant shall be an integer constant expression that has a value representable as an int.

and defined in C++ because of 7.2.5:

If the underlying type is not fixed, the type of each enumerator is the type of its initializing value: — If an initializer is specified for an enumerator, the initializing value has the same type as the expression and the constant-expression shall be an integral constant expression (5.19). — If no initializer is specified for the first enumerator, the initializing value has an unspecified integral type. — Otherwise the type of the initializing value is the same as the type of the initializing value of the preceding enumerator unless the incremented value is not representable in that type, in which case the type is an unspecified integral type sufficient to contain the incremented value. If no such type exists, the program is ill-formed.

Petr Skocik
  • 58,047
  • 6
  • 95
  • 142
  • 3
    It's not "undefined" in C, it's "ill-formed" because a constraint is violated. The compiler MUST generate a diagnostic concerning the violation. – Ben Voigt Jan 24 '17 at 19:36
  • @BenVoigt Thanks for teaching me about the difference. Fixed it in the answer (which I made because I missed a quotation from the C++ standard in the other answers). – Petr Skocik Jan 24 '17 at 20:56