Is "char foo = 255" undefined behavior if char is signed?

Question

The following doesn't give me any warning whatsoever when compiled with gcc 4.5.2 on x86 machine with Linux:

char foo = 255;

But when I use -pedantic, gcc says:

warning: overflow in implicit constant conversion

The way gcc acts is a tad strange and it makes me doubt if I really understand what's going on in this assignment. I think that if char is 8 bit long on POSIX and it's signed by default, it can't hold 255.

In C standard, it says that unsigned integer overflow results in overflow, but signed integer overflow is undefined. So is this assignment undefined behavior? And why does gcc act this way?

Somewhere I read that only real over-**flows** were UB and as such, the assignment operator didn't invoke undefined behavior, but I'm not sure. +1 for this question anyway. — , Sep 20 '13 at 17:32
But did you try to print `foo` after this? What value do you get? — smac89, Sep 20 '13 at 17:36
The warning is just to let you know that the number won't be what you set it to, rather than warning about any undefined behaviour, but the more general question is interesting. My intuition says that it is undefined (since `foo` is signed, so it would be undefined at runtime), but I have no idea. — Dave, Sep 20 '13 at 17:36
Read: [Warning : overflow in implicit constant conversion](http://stackoverflow.com/questions/5095434/warning-overflow-in-implicit-constant-conversion) — Grijesh Chauhan, Sep 20 '13 at 17:36
According to this answer: http://stackoverflow.com/a/4934719/1180785 it is indeed undefined. — Dave, Sep 20 '13 at 17:39
@DennisMeng yeah, after seeing AndreyT's answer I've just spent the past couple of minutes researching exactly that distinction! — Dave, Sep 20 '13 at 17:44
I will note that right-shifts are another "implementation-defined" thing. If that meant the same thing as "undefined" we'd all be completely screwed several times over. — Dennis Meng, Sep 20 '13 at 17:46
@DennisMeng surely right-shifts of *unsigned* values are well defined? — Dave, Sep 20 '13 at 17:47
@user1042840 If you do `printf("foo = %c\n", foo);`, you would see why it is undefined because it should print garbage — smac89, Sep 20 '13 at 18:02
@Smac89: it prints `garbage' because there is no such sign in ASCII characters but you could use values like this to compose an Unicode character — user1042840, Sep 20 '13 at 18:09

Keith Thompson · Accepted Answer · 2013-09-20T19:47:08.433

30

Summary: The result is implementation-defined and very likely to be -1, but it's complicated, at least in principle.

The rules regarding overflow are different for operators vs. conversions, and for signed types vs. unsigned types -- and the conversion rules changed between C90 and C99.

As of C90, overflow of an operator with signed integer operands ("overflow" meaning that the mathematical result cannot be represented in the expression's type) has undefined behavior. For unsigned integer operands, the behavior is well defined as the usual wraparound (strictly speaking the standard doesn't call this an "overflow"). But your declaration:

char foo = 255;

doesn't use any operators (the = is an initializer, not an assignment), so none of that applies in this case.

If type char can represent the value 255 (which is true either of plain char is unsigned or if CHAR_BIT >= 9), then of course the behavior is well defined. The int expression 255 is implicitly converted to char. (Since CHAR_BIT >= 8, it's not possible for this particular case to invoke unsigned wraparound.)

Otherwise, the conversion yields a result that can't be stored in a char.

As of C90, the result of the conversion is implementation-defined -- which means that it's guaranteed to set foo to some value within the range of type char, and you can determine what that value is by reading the implementation's documentation, which is required to tell you how the conversion works. (I've never seen an implementation where the stored value is anything other than -1, but any result is possible in principle.)

C99 changed the definition, so that an overflowing conversion to a signed type either yields an implementation-defined result or raises an implementation-defined signal.

If a compiler chooses to do the latter, then it must document which signal is raised.

So what happens if an implementation-defined signal is raised? Section 7.14 of the standard says:

The complete set of signals, their semantics, and their default handling is implementation-defined

It's not entirely clear (to me) what the range of possible behaviors for the "default handling" of signals is. In the worst case, I suppose such a signal could terminate the program. You might or might not be able to define a signal handler that catches the signal.

7.14 also says:

If and when the function returns, if the value of sig is SIGFPE, SIGILL, SIGSEGV, or any other implementation-defined value corresponding to a computational exception, the behavior is undefined; otherwise the program will resume execution at the point it was interrupted.

but I don't think that applies, since an overflowing conversion is not a "computational exception" as the term is used here. (Unless the implementation-defined signal happens to be SIGFPE, SIGILL, or SIGSEGV -- but that would be silly).

So ultimately, if an implementation chooses to raise a signal in response to an overflowing conversion, the behavior (not just the result) is at least implementation-defined, and there might be circumstances in which it could be undefined. In any case, there doesn't seem to be any portable way to deal with such a signal.

In practice, I've never heard of an implementation that takes advantage of the new wording in C99. For all compilers I've heard of, the result of the conversion is implementation-defined -- and very probably yields what you'd expect from a 2's-complement truncation. (And I'm not at all convinced that this change in C99 was a good idea. If nothing else, it made this answer about 3 times as long as it would otherwise have needed to be.)

edited Sep 20 '13 at 19:47

answered Sep 20 '13 at 18:48

Keith Thompson

254,901
44
429
631

So this: "char foo = 255" is implementation defined on POSIX systems but in practice it comes down to wraparound. Is it the same for unsigned types such as in "unsigned int foo = 9999999999"? I know it wraprounds but is it also implementation defined? And also why does gcc warn obut "overflow" in char case with -pedantic mode if it's not an overflow what happens here? – user1042840 Sep 20 '13 at 19:04
It's actually not silly at all IMO for the signal to be `SIGFPE` - while most real-world systems don't raise a signal for integer overflow, they _do_ often raise one for integer division, and the one they raise is SIFGPE. Regardless of the fact that the name implies a _floating-point_ error. (Of course, the result of integer division is undefined behavior) – Random832 Sep 20 '13 at 19:31
@user1042840: Conversion to unsigned types is defined by the standard; it effectively discards all but the low-order N bits of the result (though the standard defines the result in terms of arithmetic values, not bits). Why do you say it's not an overflow? (The C standard doesn't define that term.) – Keith Thompson Sep 20 '13 at 19:46
@ Keith Thompson: I was suggested by what Adam Rosenfield said: "Besides, there's no overflow occurring here -- all you have is an integer literal (255, of type int, which does not overflow) being converted to type char". But now I see that you were talking about how overflow is handled in conversions and expressions. – user1042840 Sep 20 '13 at 19:51
1

@Keith Thompson: So to sum up: 1. in conversions, signed overflow is implementation defined (or raises an implementation-defined signal in C99), unsigned overflow results in wraparound. 2. in operators, signed overflow is undefined behavior (described in section 6.5 in point 5) and unsigned behavior results in wraparound. Did I got it right? – user1042840 Sep 20 '13 at 20:47
@user1042840: Yes, that's it. – Keith Thompson Sep 20 '13 at 22:12
I suppose that raising a signal makes sense for interpreters, but it seems silly in a compiler. – dmckee --- ex-moderator kitten Sep 21 '13 at 01:10
@dmckee: I'm not sure how interpreter vs. compiler matters. Here's how I think of it: A signed-to-signed or unsigned-to-signed conversion whose result doesn't fit in the target type is very probably a programming error. *It would be nice* to have a consistent way to detect and handle such errors. Raising a signal might be such a way -- **if** you could write portable code that can catch the signal and do something sensible with it, sort of stripped-down version of exception handling. The clause that C99 added, "*or an implementation-defined signal is raised"*, doesn't do that. ... – Keith Thompson Sep 21 '13 at 01:46
... If you know what signal to use for a given implementation, you might be able to do something with it, but such code would only work for that implementation. There isn't even a way for a program to query whether an implementation will raise a signal, or to find out which one. And it applies only to overflow on conversions; similar overflows on arithmetic operators are simply *undefined*. I think it was intended to provide an implementation-defined hook for overflow handling, something that didn't exist in C90, ... – Keith Thompson Sep 21 '13 at 01:48
... but in practice I think it's just given programmers one more thing to worry about -- and it's something that, as far as I know, never happens in the real world anyway (because compilers haven't taken advantage of it). (Yeah, this is probably way too long for a comment.) – Keith Thompson Sep 21 '13 at 01:50
@KeithThompson PLease help me understand what you meant by "For unsigned integer operands, the behavior is well defined as the `usual wraparound`". What did you mean by *usual wraparound*? – Geek Sep 21 '13 at 03:39
@Geek: [N1570](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf) 6.2.5p9: "A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type." The effect is that all but the low-order N bits of the result are discarded, where N is the width of the unsigned type. For example, if `UINT_MAX==65535`, then `UINT_MAX + 1 == 0`. – Keith Thompson Sep 21 '13 at 04:52
@KeithThompson: Only computations involving unsigned operands *which are too large to be promoted to `int`* cannot overflow. IMHO, there's no reason why the standard shouldn't specify that if the result of an integer operator other than `>>`, `/`, and `%` is cast or coerced to an unsigned type smaller than `int`, the computation will be performed as unsigned, and the operands will be cast or coerced likewise. Such a rule would not change any defined behavior, and probably have no effect other than to guard against wacky "optimizations", but IMHO such guarding would be a good thing. – supercat May 07 '15 at 00:08
@KeithThompson: Unfortunately, the Standard presently allows compilers given an expression like `uint16_t x=y*z;` to behave in weird and wacky fashion if the product is too large to fit in an `int` even though in all cases where behavior is defined the standard would require that it be computed mod 65536. – supercat May 07 '15 at 00:10
@supercat: There are no arithmetic operations on integer types narrower than `int`. Any such operands are promoted to `int` or to `unsigned int`. I agree that it's annoying that `uint16_t x = y * z;` can overflow. As a workaround, you can change it to `uint16_t x = (unsigned)x * (unsigned)y;`. – Keith Thompson May 07 '15 at 00:12
@KeithThompson: *In all cases where behavior is defined by the standard*, and probably all where it's usefully defined by a compiler, *period*, the behavior will be the same as if performed on the shorter type, or on `unsigned int`. Further, except on platforms which are faster to perform math on `int` than `unsigned int` *and* do not handle signed overflows in modular fashion, requiring code to yield modular-arithmetic results in all cases where the result is coerced to a type that would render carry beyond the highest value bit irrelevant would have zero runtime cost. I suspect that... – supercat May 07 '15 at 13:21
...the standard would have made special provisions requiring modular arithmetic in such cases *except that it was something compilers naturally did anyway*. A slight variation of the proposed rule would be to say that if an expression involving those operators is coerced to an unsigned type no larger than `unsigned int`, the operands should be likewise coerced, though there might be some scenarios on weird architectures where that would "cost extra". Standards like MIRSA favor doing everything using only fixed-sized types, but it's not possible to do this kind of math portably... – supercat May 07 '15 at 13:28
...without making explicit use of the type `unsigned`. While writing `x*=1u*x;` isn't the worst thing in the world, I see no benefit to saying that failure to do so should allow a compiler to regard `x == 65535` as dead code. If the standard were changed so that every compiler must either follow modular-arithmetic rules in such cases or define `__STDC_NON_MODULAR_UNSIGNED_SHORT`, I would expect 99.44% of compilers written before 2010 would already have satisfied the requirement without having to do anything, and such a change would break *zero* application code. – supercat May 07 '15 at 13:34

AnT stands with Russia · Answer 2 · 2013-09-20T17:44:18.513

13

Signed integer overflow causes undefined behavior only when it occurs when evaluating intermediate results of arithmetic expressions, e.g. during binary mutiplication, unary decrement etc. Also, converting floating-point value to integer type causes undefined behavior if the value is out of range.

An overflowing integral conversion to signed type does not cause undefined behavior. Instead it produces implementation-defined result (with the possibility of an implementation-defined signal being raised).

edited Sep 20 '13 at 17:44

answered Sep 20 '13 at 17:38

AnT stands with Russia

312,472
42
525
765

2

Can you tell me how do you rely on it? – user1042840 Sep 20 '13 at 18:22

score 9 · Answer 3 · answered Sep 20 '13 at 17:37

9

According to C11, 6.3.1.3:

When a value with integer type is converted to another integer type other than _Bool, if [...] the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.

answered Sep 20 '13 at 17:37

Kerrek SB

464,522
92
875
1,084

2

@haccks: what do you mean? :-) – Kerrek SB Sep 20 '13 at 17:41
4

And presumably a program's behavior when it receives an "implementation-defined signal" is undefined. (The signal clause was added by C99; I don't know of any actual implementations tha to that, and I'm skeptical that it was a good idea.) – Keith Thompson Sep 20 '13 at 17:47
@KeithThompson In this case, I'm even happier, since the code I am writing with unsigned to signed conversion is written in C89. – Sep 20 '13 at 17:51
@Kerre SB: That's true, but in n1256.pdf in 3.4.3 it says: "An example of undefined behavior is the behavior on integer overflow". Would the standard say two different things on the same matter? Maybe my understand of overflow is incorrrect. – user1042840 Sep 20 '13 at 18:29
But if gcc warns about "overflow in implicit constant conversion", I think this *is* an overflow. – user1042840 Sep 20 '13 at 18:43
@user1042840: That example clause in 3.4.3 is non-normative (i.e. informative), see clause 6 in the Foreword section. Besides, there's no overflow occurring here -- all you have is an integer literal (255, of type `int`, which does not overflow) being converted to type `char`. – Adam Rosenfield Sep 20 '13 at 18:58
@Adam Rosenfield: So is gcc warning misleading in this case? – user1042840 Sep 20 '13 at 18:59
@user1042840: It's only misleading if you expect the terminology to correlate precisely with the standard. On the other hand, since the compiler knows the concrete range of the target type, it is in a good position to tell you "this value is too large", which is essentially what it's doing. – Kerrek SB Sep 20 '13 at 19:16

Is "char foo = 255" undefined behavior if char is signed?

3 Answers3

Linked