178

Consider this C++ code:

enum class Color : char { red = 0x1, yellow = 0x2 }
// ...
char *data = ReadFile();
Color color = static_cast<Color>(data[0]);

Suppose that data[0] is actually 100. What is color set to according to the standard? In particular, if I later do

switch (color) {
    // ... red and yellow cases omitted
    default:
        // handle error
        break;
}

does the standard guarantee that default will be hit? If not, what is the proper, most efficient, most elegant way to check for an error here? Does the standard make any guarantees as about this but with plain enum?

Braiam
  • 1
  • 11
  • 47
  • 78
darth happyface
  • 2,687
  • 2
  • 20
  • 15

1 Answers1

154

What is color set to according to the standard?

Answering with a quote from the C++11 and C++14 Standards:

[expr.static.cast]/10

A value of integral or enumeration type can be explicitly converted to an enumeration type. The value is unchanged if the original value is within the range of the enumeration values (7.2). Otherwise, the resulting value is unspecified (and might not be in that range).

Let's look up the range of the enumeration values: [dcl.enum]/7

For an enumeration whose underlying type is fixed, the values of the enumeration are the values of the underlying type.

Before CWG 1766 (C++11, C++14) Therefore, for data[0] == 100, the resulting value is specified(*), and no Undefined Behaviour (UB) is involved. More generally, as you cast from the underlying type to the enumeration type, no value in data[0] can lead to UB for the static_cast.

After CWG 1766 (C++17) See CWG defect 1766. The [expr.static.cast]p10 paragraph has been strengthened, so you now can invoke UB if you cast a value that is outside the representable range of an enum to the enum type. This still doesn't apply to the scenario in the question, since data[0] is of the underlying type of the enumeration (see above).

Please note that CWG 1766 is considered a defect in the Standard, hence it is accepted for compiler implementers to apply to to their C++11 and C++14 compilation modes.

(*) char is required to be at least 8 bit wide, but isn't required to be unsigned. The maximum value storable is required to be at least 127 per Annex E of the C99 Standard.


Compare to [expr]/4

If during the evaluation of an expression, the result is not mathematically defined or not in the range of representable values for its type, the behavior is undefined.

Before CWG 1766, the conversion integral type -> enumeration type can produce an unspecified value. The question is: Can an unspecified value be outside the representable values for its type? I believe the answer is no -- if the answer was yes, there wouldn't be any difference in the guarantees you get for operations on signed types between "this operation produces an unspecified value" and "this operation has undefined behaviour".

Hence, prior to CWG 1766, even static_cast<Color>(10000) would not invoke UB; but after CWG 1766, it does invoke UB.


Now, the switch statement:

[stmt.switch]/2

The condition shall be of integral type, enumeration type, or class type. [...] Integral promotions are performed.

[conv.prom]/4

A prvalue of an unscoped enumeration type whose underlying type is fixed (7.2) can be converted to a prvalue of its underlying type. Moreover, if integral promotion can be applied to its underlying type, a prvalue of an unscoped enumeration type whose underlying type is fixed can also be converted to a prvalue of the promoted underlying type.

Note: The underlying type of a scoped enum w/o enum-base is int. For unscoped enums the underlying type is implementation-defined, but shall not be larger than int if int can contain the values of all enumerators.

For an unscoped enumeration, this leads us to /1

A prvalue of an integer type other than bool, char16_t, char32_t, or wchar_t whose integer conversion rank (4.13) is less than the rank of int can be converted to a prvalue of type int if int can represent all the values of the source type; otherwise, the source prvalue can be converted to a prvalue of type unsigned int.

In the case of an unscoped enumeration, we would be dealing with ints here. For scoped enumerations (enum class and enum struct), no integral promotion applies. In any way, the integral promotion doesn't lead to UB either, as the stored value is in the range of the underlying type and in the range of int.

[stmt.switch]/5

When the switch statement is executed, its condition is evaluated and compared with each case constant. If one of the case constants is equal to the value of the condition, control is passed to the statement following the matched case label. If no case constant matches the condition, and if there is a default label, control passes to the statement labeled by the default label.

The default label should be hit.

Note: One could take another look at the comparison operator, but it is not explicitly used in the referred "comparison". In fact, there's no hint it would introduce UB for scoped or unscoped enums in our case.


As a bonus, does the standard make any guarantees as about this but with plain enum?

Whether or not the enum is scoped doesn't make any difference here. However, it does make a difference whether or not the underlying type is fixed. The complete [decl.enum]/7 is:

For an enumeration whose underlying type is fixed, the values of the enumeration are the values of the underlying type. Otherwise, for an enumeration where emin is the smallest enumerator and emax is the largest, the values of the enumeration are the values in the range bmin to bmax, defined as follows: Let K be 1 for a two's complement representation and 0 for a one's complement or sign-magnitude representation. bmax is the smallest value greater than or equal to max(|emin| − K, |emax|) and equal to 2M − 1, where M is a non-negative integer. bmin is zero if emin is non-negative and −(bmax + K) otherwise.

Let's have a look at the following enumeration:

enum ColorUnfixed /* no fixed underlying type */
{
    red = 0x1,
    yellow = 0x2
}

Note that we cannot define this as a scoped enum, since all scoped enums have fixed underlying types.

Fortunately, ColorUnfixed's smallest enumerator is red = 0x1, so max(|emin| − K, |emax|) is equal to |emax| in any case, which is yellow = 0x2. The smallest value greater or equal to 2, which is equal to 2M - 1 for a positive integer M is 3 (22 - 1). (I think the intent is to allow the range to extent in 1-bit-steps.) It follows that bmax is 3 and bmin is 0.

Therefore, 100 would be outside the range of ColorUnfixed, and the static_cast would produce an unspecified value before CWG 1766 and undefined behaviour after CWG 1766.

dyp
  • 38,334
  • 13
  • 112
  • 177
  • 4
    The underlying type is fixed, so the range of the enumeration values (§7.2 [dcl.enum] p7) is "the values of the underlying type". 100 is certainly a value of `char`, so "The value is unchanged if the original value is within the range of the enumeration values (7.2)." applies. – Casey Aug 12 '13 at 20:04
  • @Casey Yes, thanks, I was just about looking up again the range of the enumeration type. – dyp Aug 12 '13 at 20:09
  • Thank you. So, to confirm, I should arguably use unsigned char as the base of Color, not char, just in case someone were to be using an unsigned char buffer? – darth happyface Aug 12 '13 at 20:50
  • @darthhappyface I don't think this would help. If you use `unsigned char` as the underlying type, and someone uses a `signed char` buffer (or `char` if that's signed), [expr.static.cast]/10 still gives problems as the negative values of `signed char` are not within the range of the underlying type `unsigned char`. That said, that's all a bit academic, as conversions from signed to unsigned types are well-defined in other contexts [conv.integral]/2. – dyp Aug 12 '13 at 20:55
  • @darthhappyface What about providing a class with a clean interface? – dyp Aug 12 '13 at 20:56
  • I'm not sure what you mean. The code base I'm working on uses a lot of switch-case blocks on variables of the enum type. It is useful because the compiler can check to make sure that all the possible enumerators are covered. However, it stores the enumerator on disk, which is where the char buffers come. And it uses the static_cast pattern I was asking about. Is there a way to switch to a class without losing any of these properties? – darth happyface Aug 12 '13 at 22:17
  • The idea was that if this `static_cast` is error prone (because it compiles with any integral type as argument), it shouldn't be done by everybody ("just in case someone were to be using an unsigned char buffer"). A function to perform this cast or even a class managing the enum value (e.g. providing a safe "setter" or "loader" and a "getter") might be appropriate. – dyp Aug 12 '13 at 22:52
  • 3
    I had to search to look up what "UB" meant. ('undefined behaviour') The question didn't mention the possibility of undefined behaviour; so it didn't occur to me that you might be talking about that. – karadoc Oct 22 '13 at 10:00
  • 2
    @karadoc I've added a link at the first occurrence of the term. – dyp Oct 22 '13 at 16:09
  • I just missed a paragraph. – David Stone Feb 01 '15 at 22:19
  • 1
    Love this answer. For those skimming too quickly, note that the last sentence "Therefore, 100 would be outside the range..." only applies if the code were modified to remove the underlying type specification (char in this case). I think that's what was meant, anyway. – Eric Seppanen Mar 29 '16 at 18:45
  • @dyp So, doesn't CWG 1766 invalidate further text of your answer? Or has this change not made it into the standard? – Ruslan Aug 29 '16 at 14:31
  • 1
    @Ruslan CWG 1766 (or the resolution thereof) is **not** part of C++14, but I think it will be part of C++17. Even with the C++17 rules, I don't quite understand what you mean with "invalidate further text of your answer". The other parts of my answer are mainly concerned with that the "range of enumeration values" is that expr.static.cast p10 is referring to. – dyp Aug 29 '16 at 16:51
  • Ah, I read it the rest incorrectly, sorry. It now looks OK. One bit though: _required to be smaller than `int` if `int` can contain the values of all enumerators_: what did you mean by this? How can the type be smaller than `int` if e.g. one of enumerator values is `INT_MAX`? – Ruslan Aug 29 '16 at 18:18
  • @Ruslan Yeah that doesn't make much sense. Fixed, thanks. – dyp Aug 29 '16 at 20:58
  • 1
    Oh, geeze, there isn't even a safety guarantee when the underlying type is explicitly unsigned? They are making it *very* difficult to use enums safely. – user2357112 Sep 26 '16 at 21:30
  • @user2357112 I think casting to the underlying type first and then casting to the enumeration should be safe (if the underlying type is unsigned). I agree that's not straight-forward, though. – dyp Sep 27 '16 at 07:31
  • @dyp It is still unclear what causes a UB because of what Eric Seppanen said: "..."Therefore, 100 would be outside the range..." only applies if the code were modified to remove the underlying type specification..." And there was no followup to that, and as I read it from your answer: "Therefore, 100 would be outside the range of the enum, and the static_cast would produce an unspecified value, which could lead to UB as per [expr]/4." -- This leads me to believe even though the UT is CHAR(8bits), it will still be UB ... Correct? Please make this more clear. Is there anything new for C++17? – Mike5050 Mar 23 '18 at 17:16
  • I would also like to add an example if I may to clear this up. enum class foo : unsigned int { bar = 1 }; Is it UB if you do foo bar = foo:bar + foo:bar (assuming you have bitwise operations on). Is bar value 2? Is it UB? This needs to be clear, I will make a question if this is not answered. – Mike5050 Mar 23 '18 at 17:30
  • 1
    @Mike5050 I have changed my stance on the "unspecified value" topic and adjusted my answer. Hopefully it now answers most things left unclear. For your example of the enum `foo`, not sure what you mean with "have bitwise operations on", but `foo::bar + foo::bar` can't compile since there are no arithmetic operations on scoped enums. If you implement them or just turn `foo` into an unscoped enum with fixed underlying type => no UB since 2 is representable by `unsigned int`. If you turned `foo` into an enum w/o fixed underlying type, then `b_max == 1` ==> `bar + bar` is UB. – dyp Mar 25 '18 at 13:21