comparison between signed and unsigned integer expressions and 0x80000000

Question

I have the following code:

#include <iostream>

using namespace std;

int main()
{
    int a = 0x80000000;
    if(a == 0x80000000)
        a = 42;
    cout << "Hello World! :: " << a << endl;
    return 0;
}

The output is

Hello World! :: 42

so the comparison works. But the compiler tells me

g++ -c -pipe -g -Wall -W -fPIE  -I../untitled -I. -I../bin/Qt/5.4/gcc_64/mkspecs/linux-g++ -o main.o ../untitled/main.cpp
../untitled/main.cpp: In function 'int main()':
../untitled/main.cpp:8:13: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
     if(a == 0x80000000)
             ^

So the question is: Why is 0x80000000 an unsigned int? Can I make it signed somehow to get rid of the warning?

As far as I understand, 0x80000000 would be INT_MIN as it's out of range for positive a integer. but why is the compiler assuming, that I want a positive number?

I'm compiling with gcc version 4.8.1 20130909 on linux.

“but why is the compiler assuming, that I want a positive number” — because you didn’t use a minus sign?! — Konrad Rudolph, Jun 27 '15 at 12:48
I just tested that: `warning: comparison between signed and unsigned integer expressions [-Wsign-compare] if(a == -0x80000000)` so it's still complaining — Adam, Jun 27 '15 at 12:50
Well yeah, the literal is too large to fit into a `signed int` so the compiler makes it `unsigned`. — Konrad Rudolph, Jun 27 '15 at 12:52
The type on the right of the equals sign is not determined by the type on the left. — Jonathan Potter, Jun 27 '15 at 13:19

6502 · Accepted Answer · 2015-06-28T06:14:11.333

0x80000000 is an unsigned int because the value is too big to fit in an int and you did not add any L to specify it was a long.

The warning is issued because unsigned in C/C++ has a quite weird semantic and therefore it's very easy to make mistakes in code by mixing up signed and unsigned integers. This mixing is often a source of bugs especially because the standard library, by historical accident, chose to use an unsigned value for the size of containers (size_t).

An example I often use to show how subtle is the problem consider

// Draw connecting lines between the dots
for (int i=0; i<pts.size()-1; i++) {
    draw_line(pts[i], pts[i+1]);
}

This code seems fine but has a bug. In case the pts vector is empty pts.size() is 0 but, and here comes the surprising part, pts.size()-1 is a huge nonsense number (today often 4294967295, but depends on the platform) and the loop will use invalid indexes (with undefined behavior).

Here changing the variable to size_t i will remove the warning but is not going to help as the very same bug remains...

The core of the problem is that with unsigned values a < b-1 and a+1 < b are not the same thing even for very commonly used values like zero; this is why using unsigned types for non-negative values like container size is a bad idea and a source of bugs.

Also note that your code is not correct portable C++ on platforms where that value doesn't fit in an integer as the behavior around overflow is defined for unsigned types but not for regular integers. C++ code that relies on what happens when an integer gets past the limits has undefined behavior.

Even if you know what happens on a specific hardware platform note that the compiler/optimizer is allowed to assume that signed integer overflow never happens: for example a test like a < a+1 where a is a regular int can be considered always true by a C++ compiler.

Do you have a reference for size_t being unsigned by mistake or is it a personal opinion? — Support Ukraine, Jun 27 '15 at 14:33
@StillLearning: Of course I've a strong opinion (matured with reasoning based on simple logic) that it's a mistake. I'm not alone in thinking so (see http://stackoverflow.com/q/10168079/320726)... `unsigned` is not a "non-negative integer", but it's more like a bitmask. Container size is unsigned not because it's logical (it's not) but because of an historical accident dating back to when common CPUs were 16-bit (IMO an error even back then, btw). — 6502, Jun 27 '15 at 14:52
you have provided a good answer but I don't like that put your personal opinion in the middle. It is misleading for new folks. Please remove that part. If you can't document that it is accepted as a mistake, your answer is not appropriate. — Support Ukraine, Jun 27 '15 at 21:36
@StillLearning: I changed `by mistake` to `by historical accident` (even if it actually was a mistake ;-) ). Also added some more explanation of why using `unsigned` for quantities is a bad idea in general. — 6502, Jun 28 '15 at 06:16
I think that is better - would prefer `historical reasons` but still it's better than `mistake`. I don't mind a personal opinion as long as it is clear stated as such. Changed my down-vote to up-vote. :-) — Support Ukraine, Jun 28 '15 at 08:04

DanielHsH · Answer 2 · 2015-06-27T13:40:25.307

It seems you are confusing 2 different issues: The encoding of something and the meaning of something. Here is an example: You see a number 97. This is a decimal encoding. But the meaning of this number is something completely different. It can denote the ASCII 'a' character, a very hot temperature, a geometrical angle in a triangle, etc. You cannot deduce meaning from encoding. Someone must supply a context to you (like the ASCII map, temperature etc).

Back to your question: 0x80000000 is encoding. While INT_MIN is meaning. There are not interchangeable and not comparable. On a specific hardware in some contexts they might be equal just like 97 and 'a' are equal in the ASCII context.

Compiler warns you about ambiguity in the meaning, not in the encoding. One way to give meaning to a specific encoding is the casting operator. Like (unsigned short)-17 or (student*)ptr;

On a 32 bits system or 64bits with back compatibility int and unsigned int have encoding of 32bits like in 0x80000000 but on 64 bits MIN_INT would not be equal to this number.

Anyway - the answer to your question: in order to remove the warning you must give identical context to both left and right expressions of the comparison. You can do it in many ways. For example:

(unsigned int)a == (unsigned int)0x80000000 or (__int64)a == (__int64)0x80000000 or even a crazy (char *)a == (char *)0x80000000 or any other way as long as you maintain the following rules:

You don't demote the encoding (do not reduce the amount of bits it requires). Like (char)a == (char)0x80000000 is incorrect because you demote 32 bits into 8 bits
You must give both the left side and the right side of the == operator the same context. Like (char *)a == (unsigned short)0x80000000 is incorrect an will yield an error/warning.

I want to give you another example of how crucial is the difference between encoding and meaning. Look at the code

char a = -7;  
bool b = (a==-7) ? true : false;

What is the result of 'b'? The answer will shock you: it is undefined. Some compilers (typically Microsoft visual studio) will compile a program that b will get true while on Android NDK compilers b will get false. The reason is that Android NDK treats 'char' type as 'unsigned char', while Visual studio treats 'char' as 'signed char'. So on Android phones the encoding of -7 actually has a meaning of 249 and is not equal to the meaning of (int)-7. The correct way to fix this problem is to specifically define 'a' as signed char:

 signed char a = -7;  
 bool b = (a==-7) ? true : false;

Support Ukraine · Answer 3 · 2015-06-28T08:13:38.660

1

0x80000000 is considered unsigned per default. You can avoid the warning like this:

    if (a == (int)0x80000000)
        a=42;

Edit after a comment:

Another (perhaps better) way would be

    if ((unsigned)a == 0x80000000)
        a=42;

edited Jun 28 '15 at 08:13

answered Jun 27 '15 at 13:04

Support Ukraine

42,271
4
38
63

@MatteoItalia there are two types of engineers - whose that make things work and those that spend time on time on therectical stuff. Be my guest - define this as undefined - still it works on every computer today. – Support Ukraine Jun 27 '15 at 21:31
@MatteoItalia - with are current vote of zero (and downvotes to come) this few line answer is still the only answer that solves and addresses both OPs problems. – Support Ukraine Jun 27 '15 at 21:54
Yes, go proud of the "works on my compiler" badge, that's exactly what they thought in several recent Linux kernel bugs, where the optimizer exploited these technicalities to perform technically correct but counterintuitive optimizations. That's why you have zero upvotes (and now my downvote), love it or not that's the trend in the compilers behavior - especially in this case where the workaround is simple: you should just convert the other way, as unsigned overflow is defined. – Matteo Italia Jun 27 '15 at 23:10
@MatteoItalia - please come with an example of two (commonly used) compilers which give different results for this code. – Support Ukraine Jun 28 '15 at 08:06

comparison between signed and unsigned integer expressions and 0x80000000

3 Answers3

Linked