3

In C++, why is long l = 0x80000000; positive?

C++:
long l = 0x80000000; // l is positive. Why??

int i = 0x80000000;
long l = i; // l is negative

According to this site: https://en.cppreference.com/w/cpp/language/integer_literal, 0x80000000 should be a signed int but it doesn't appear to be case because when it gets assigned to l sign extension doesn't occur.

Java:
long l = 0x80000000; // l is negative

int i = 0x80000000;
long l = i; // l is negative

On the other hand, Java has a more consistent behavior.

C++ Test code:

#include <stdio.h>
#include <string.h>

void print_sign(long l) {
    if (l < 0) {
        printf("Negative\n");
    } else if (l > 0) {
        printf("Positive\n");
    } else {
        printf("Zero\n");
    }    
}

int main() {
    long l = -0x80000000;
    print_sign(l); // Positive

    long l2 = 0x80000000;
    print_sign(l2); // Positive

    int i =   0x80000000;
    long l3 = i;
    print_sign(l3); // Negative

    int i2 =  -0x80000000;
    long l4 = i2;
    print_sign(l4); // Negative
}
No Ordinary Love
  • 569
  • 1
  • 4
  • 15
  • 0x80000000 can't fit in int, it overflows thus int i doesn't contain the value `0x80000000`, its overflowed – unknown.prince May 22 '20 at 05:56
  • I'm pretty sure 0x80000000 can fit in an int. – No Ordinary Love May 22 '20 at 05:57
  • 3
    I guess in your system `long` is 64 bytes long . A 32 bytes integer would surely become negative with `0x80000000`. You can test it via `sizeof(long)` as well as `std::numerical_limits` – ALX23z May 22 '20 at 05:58
  • 1
    My system reports sizeof(int) == 4 and sizeof(long) == 8 – No Ordinary Love May 22 '20 at 05:59
  • Then 0x80000000 cannot fit in an int, it's larger than the maximum possible int which is 0x7FFFFFFF – john May 22 '20 at 05:59
  • @ALX23z Yes 0x80000000 is the smallest 4-byte integer, it's a negative number that should sign-extend when assigned to a long. – No Ordinary Love May 22 '20 at 06:00
  • @NoOrdinaryLove 0x80000000 is a positive number, that's obvious just by looking at it. It's not the smallest int, it's one bigger than the largest int. – john May 22 '20 at 06:01
  • @john But 0x80000000 is a signed int, it should fit in an int.. 0x80000000u, on the other hand, is an unsigned int and cannot fit in a signed int – No Ordinary Love May 22 '20 at 06:01
  • Then clearly your C++ long can hold positive numbers up to `2^63-1` which is much larger than `0x8000 0000`. (It is `0x7fff ffff ffff ffff`) – ALX23z May 22 '20 at 06:02
  • @NoOrdinaryLove It's not a signed int. Its a signed long. Read that reference you quoted. – john May 22 '20 at 06:02
  • 1
    0x8000000 does not fit in an int. -0x8000000 fits. 0x8000000 does not. The highest positive in value is 0x7FFFFFFF. So assigning 0x8000000 to an int is overflow behavior. That's undefined. See link – gman May 22 '20 at 06:02
  • @NoOrdinaryLove here's the quote `The type of the integer literal is the first type in which the value can fit,` – john May 22 '20 at 06:04
  • @john Sorry I don't get it.. 0x80000000 has no suffix and it can fit in an int. Which is why it should be negative. To make things more confusing, even `long l = -0x80000000;` is positive – No Ordinary Love May 22 '20 at 06:05
  • 1
    @NoOrdinaryLove I don't know how to avoid saying the same thing over and over, 0x80000000 does not fit in an int, because it's bigger than the largest integer. It seems completely obvious to me. – john May 22 '20 at 06:06
  • @gman `long l = -0x80000000;` is positive – No Ordinary Love May 22 '20 at 06:06
  • Thats because it starts as an int, it overflows (undefined). If you wanted a long then `long l = -0x8000000l;` – gman May 22 '20 at 06:07
  • 0x80000000 is a posive number. Because `int` cannot hold such a large positive number it copies data as is and the value interpreted as a negative number. – ALX23z May 22 '20 at 06:08
  • @john It does fit. Sorry I'm gonna leave it at that. Even Java, a language that forbids you from assigning a value too big to a data type, allows `int i = 0x80000000;` and `long l = 0x80000000` – No Ordinary Love May 22 '20 at 06:08
  • @gman Sorry even `long l = -0x8000000l;` comes out as positive. ``` #include int main() { long l = -0x80000000; if (l < 0) { printf("Negative\n"); } else if (l > 0) { printf("Positive\n"); } else { printf("Zero\n"); } } ``` – No Ordinary Love May 22 '20 at 06:10
  • 1
    You repeated the same mistake in your example. You put 0x8000000 in to `l` instead of 0x8000000l (an 'l' on the end). You ended up with an value that does not fit in an int, and you got an undefined behavior. – gman May 22 '20 at 06:13
  • 2
    You seem to think that because 0x8000000 is a 32bit value that it fits in an 32bit int. It doesn't. It fits in a 32bit unsigned int but not a 32bit signed int. Note: [Java defines different behavior](https://stackoverflow.com/questions/3001836/how-does-java-handle-integer-underflows-and-overflows-and-how-would-you-check-fo) than C++. Java makes overflow and underflow explicit. C++ does not. That allows C++ to optimize things that Java does not. – gman May 22 '20 at 06:15
  • @NoOrdinaryLove Sorry, but you are being told the answer, but you just refuse to accept it. If a literal is numerically bigger than the largest value of a given type, it does not fit in that type. That has nothing to do with whether it is assignable to that type. That's a different question. – john May 22 '20 at 06:15
  • But why is 0x80000000 an unsigned int? It should be a signed int. 0x80000000u is the unsigned int version. – No Ordinary Love May 22 '20 at 06:16
  • 0x80000000 is held in a 64-bytes integer. Thus it is positive. – ALX23z May 22 '20 at 06:18
  • 2
    @NoOrdinaryLove In the section that explains what happens when an literal overflows it gives a list of type that are tried instead. For hexadecimal numbers without suffixes that list is `unsigned int, long, unsigned long ...` Again I suggest you just read the reference you quoted. – john May 22 '20 at 06:19
  • @john You missed the first type in the list: int – No Ordinary Love May 22 '20 at 06:20
  • 2
    @NoOrdinaryLove Yes, but we've already established that this number overflows an `int`. For the reasons I've explained repeatedly. But it doesn't overflow the second item in the list, `unsigned int` so that it the one that is picked. – john May 22 '20 at 06:21
  • @NoOrdinaryLove I must confess, until I read that reference I didn't realise that the rules were different for decimal and hexadecimal literals. So I've learned something too. – john May 22 '20 at 06:23
  • 1
    @NoOrdinaryLove You can easily check that the type of `0x80000000` is `unsigned int`. Live demo: https://godbolt.org/z/5sJNGo. Assigning this value to `int` therefore does not make sense, since it is out of its range. If you assign it to `long`, there is no such problem. – Daniel Langr May 22 '20 at 06:34
  • The C++ standard says that "The type of an integer literal is the first of the corresponding list in Table 6 in which its value can be represented". On your platform `0x80000000` cannot be represented by an `int`, but it can be represented by a `long int` - so that is the type of that literal. – Michael Burr May 22 '20 at 06:34
  • @MichaelBurr It's not, it's `unsigned int`. The rules for hexadecimal literals are different from those for decimal literals. – Daniel Langr May 22 '20 at 06:35
  • @DanielLangr: you are right - I pulled from the wrong column of the table. Either way, the initialized value of the `long l` variable is the same. – Michael Burr May 22 '20 at 06:38
  • @DanielLangr So is `-0x80000000` the correct way of specifying the value? Does the minus sign force the compiler to treat this as just an `int`? Because I tried `long l = -0x80000000;` and it still came out positive. – No Ordinary Love May 22 '20 at 06:42
  • `-0x80000000l` seems to do the trick. – No Ordinary Love May 22 '20 at 06:45
  • 2
    @NoOrdinaryLove Note the following quote from the reference: _There are no negative integer literals. Expressions such as -1 apply the unary minus operator to the value represented by the literal, which may involve implicit type conversions._ Therefore `-0x80000000` is represented as `0x80000000` of type `unsigned int` to which is than applied unary `-` (which does not make sense for unsigned type). – Daniel Langr May 22 '20 at 06:51
  • @DanielLangr That answers it! Thank you! – No Ordinary Love May 22 '20 at 06:54
  • 2
    @NoOrdinaryLove BTW, I don't think this is a duplicate of that question. This is a more specific problem, moreover, involving hex literals. I will vote for reopen. – Daniel Langr May 22 '20 at 06:54

2 Answers2

4

From your link: "The type of the integer literal is the first type in which the value can fit, from the list of types which depends on which numeric base and which integer-suffix was used." and for hexadecimal values lists int, unsigned int...

Your compiler uses 32 bit ints, so the largest (signed) int is 0x7FFFFFFF. The reason a signed int cannot represent 0x8000000...0xFFFFFFF is that it needs some of the 2^32 possible values of its 32 bits to represent negative numbers. However, 0x80000000 fits in an 32 bit unsigned int. Your compiler uses 64 bit longs, which can hold up to 0x7FFF FFFF FFFF FFFF, so 0x80000000 also fits in a signed long, and so the long l is the positive value 0x80000000.

On the other hand int i is a signed int and simply doesn't fit 0x80000000, so undefined behaviour occurs. What often happens when a signed number is too big to fit in C++ is that two-complement arithmetic is used and the number wraps round to a large negative number. (Do not rely on this behaviour; optimisations have been known to break this). In any case it appears the two's complement behaviour has indeed happened in this case, resulting in i being negative.

In your example code you use both 0x80000000 and -0x80000000 and in each case they have the same result. In fact, the are the same. Recall that 0x8000000 is an unsigned int. The 2003 C++ standard says in 5.3.1c7: "The negative of an unsigned quantity is computed by subtracting its value from 2^n, where n is the number of bits in the promoted operand." 0x80000000 is precisely 2^31, and so -0x80000000 is 2^32-2^31=2^31. To get the expected behaviours we would have to use -(long)0x80000000 instead.

gmatht
  • 835
  • 6
  • 14
0

With the help of the awesome people on SO, I think I can answer my own question now:

Just to correct the notion that 0x80000000 can't fit in an int: It is possible to store, without loss or undefined behavior, the value 0x80000000 to an int (assuming sizeof(int) == 4). The following code can demonstrate this behavior:

#include <limits.h>
#include <stdio.h>

int main() {
    int i = INT_MIN;
    printf("%X\n", i);
    return 0;
}

Assigning the literal 0x80000000 to a variable is little more nuanced, though.

What the other others failed to mention (except @Daniel Langr) is the fact that C++ doesn't have a concept of negative literals.

There are no negative integer literals. Expressions such as -1 apply the unary minus operator to the value represented by the literal, which may involve implicit type conversions.

With this in mind, the literal 0x80000000 is always treated as a positive number. Negations come after the size and sign have been determined. This is important: negations don't affect the unsigned/signedness of the literal, only the base and the value do. 0x80000000 is too big to fit in a signed integer, so C++ tries to use the next applicable type: unsigned int, which then succeeds. The order of types C++ tries depends on the base of the literal plus any suffixes it may or may not have.

The table is listed here: https://en.cppreference.com/w/cpp/language/integer_literal

So with this rule in mind let's work out some examples:

  1. -2147483648: Treated as a long int because it can't fit in an int.
  2. 2147483648: Treated as a long int because C++ doesn't consider unsigned int as a candidate for decimal literals.
  3. 0x80000000: Treated as an unsigned int because C++ considers unsigned int as a candidate for non-decimal literals.
  4. (-2147483647 - 1): Treated as an int. This is typically how INT_MIN is defined to preserve the type of the literal as an int. This is the type safe way of saying -2147483648 as an int.
  5. -0x80000000: Treated as an unsigned int even though there's a negation. Negating any unsigned is undefined behavior, though.
  6. -0x80000000l: Treated as a long int and the sign is properly negated.
No Ordinary Love
  • 569
  • 1
  • 4
  • 15
  • I don't think that there is undefined behaviour in your code since the standard defines casting negative values to unsigned by subtracting from the types maximum value, I don't see how your test shows there is no undefined behaviour though. Undefined behaviour could result in anything *including* what you had hoped for. In any case, I think it is more accurate to say you code converts INT_MIN into 0x80000000 than INT_MIN is 0x8000000, after all INT_MIN is negative and the later is not. It is true that INT_MIN is represented by the CPU as 0x8000000 though. – gmatht May 22 '20 at 10:36
  • 1
    Re: number (5) negating a unsigned int doesn't result in undefined behaviour, it is defined as subtracting the value from the maximum value of the unsigned value. See e.g. https://stackoverflow.com/questions/8026694/c-unary-minus-operator-behavior-with-unsigned-operands – gmatht May 22 '20 at 10:43
  • And `0x80000000L` is treated as `unsigned long` on Windows, because `long` on Windows is 4 bytes and positive 0x80000000 cannot fit it. And I was curious why all Windows styles are `long` and only `WS_POPUP` is `unsigned long` :) – 4LegsDrivenCat Jul 24 '22 at 16:39