4

GCC version 5.4.0 Ubuntu 16.04

I have noticed some weird behavior with the right shift in C when I store a value in variable or not.

This code snippet is printing 0xf0000000, the expected behavior

int main() {
    int x = 0x80000000
    printf("%x", x >> 3);
}

These following two code snippets are printing 0x10000000, which is very weird in my opinion, it is performing logical shifts on a negative number

1.

int main() {
    int x = 0x80000000 >> 3
    printf("%x", x);
}

2.

int main() {
    printf("%x", (0x80000000 >> 3));
}

Any insight would be really appreciated. I do not know if it a specific issue with my personal computer, in which case it can't be replicated, or if it is just a behavior in C.

Suhas
  • 550
  • 4
  • 11
  • 3
    On system with 32-bit integers, `0x80000000` is an *unsigned* integer, since it cannot be represented as a 32-bit signed integer without becoming negative. Therefore, the shift is unsigned. Assigning the result to a signed integer after the shift has been performed won't affect the result. – Tom Karzes Sep 14 '18 at 21:24
  • @TomKarzes `0x80000000` is a positive number on all systems; so `0x8000000 >> 3` is always the value 0x10000000 whether it be signed or unsigned – M.M Sep 14 '18 at 22:48
  • 1
    `printf("%x", x >> 3);` causes undefined behaviour by using the wrong format specfier for `int` – M.M Sep 14 '18 at 23:00
  • @TomKarzes that's only applies to systems with 32-bit integers prior to C99. Modern compilers won't have that behavior – phuclv Sep 15 '18 at 01:16
  • [(-2147483648> 0) returns true in C++?](https://stackoverflow.com/q/14695118/995714), [Why does the smallest int, −2147483648, have type 'long'?](https://stackoverflow.com/q/34724320/995714), [Why is 0 < -0x80000000?](https://stackoverflow.com/q/34182672/995714), [Why does MSVC pick a long long as the type for -2147483648?](https://stackoverflow.com/q/34725215/995714) – phuclv Sep 15 '18 at 01:21

3 Answers3

4

Quoting from https://en.cppreference.com/w/c/language/integer_constant, for an hexadecimal integer constant without any suffix

The type of the integer constant is the first type in which the value can fit, from the list of types which depends on which numeric base and which integer-suffix was used.

int
unsigned int
long int
unsigned long int
long long int(since C99)
unsigned long long int(since C99)

Also, later

There are no negative integer constants. Expressions such as -1 apply the unary minus operator to the value represented by the constant, which may involve implicit type conversions.

So, if an int has 32 bit in your machine, 0x80000000 has the type unsigned int as it can't fit an int and can't be negative.

The statement

int x = 0x80000000;

Converts the unsigned int to an int in an implementation defined way, but the statement

int x = 0x80000000 >> 3;

Performs a right shift to the unsigned int before converting it to an int, so the results you see are different.

EDIT

Also, as M.M noted, the format specifier %x requires an unsigned integer argument and passing an int instead causes undefined behavior.

Bob__
  • 12,361
  • 3
  • 28
  • 42
  • Thank you! I am still new to C and working on the bit level. I did not know that 0x80000000 is by default interpreted as 2^31 and unsigned. That explanation helped! – Suhas Sep 14 '18 at 21:51
  • @Suhas C uses value semantics; `123` or `0xabc` etc. mean the mathematical number in base 10 or 16 respectively, NOT the `int` value who would be stored on the system as that bit pattern – M.M Sep 14 '18 at 22:54
0

Right shift of the negative integer has implementation defined behavior. So when shifting right the negative number you cant "expect" anything

So it is just as it is in your implementation. It is not weird.

6.5.7/5 [...] If E1 has a signed type and a negative value, the resulting value is implementation- defined.

It may also invoke the UB

6.5.7/4 [...] If E1 has a signed type and nonnegative value, and E1×2E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.

0___________
  • 60,014
  • 4
  • 34
  • 74
  • 1
    With 32-bit integers, `0x80000000` is an unsigned integer, so the shifts are unsigned. The quotes you showed do not apply to this situation. – Tom Karzes Sep 14 '18 at 21:25
  • @TomKarzes but apply to the type of the of the variable. – 0___________ Sep 14 '18 at 22:56
  • @TomKarzes int x = 0x80000000 32 bit integer is signed when assigned to the `int` variable. So you are wrong I afraid. – 0___________ Sep 14 '18 at 23:02
  • You're right that the first case has the value in a signed 32-bit variable before shifting, so your comments do apply to that case. I was referring to the other two cases, the ones OP was asking about, in which the shifts are are performed directly on the constant, resulting in unsigned shifts. – Tom Karzes Sep 14 '18 at 23:06
  • @TomKarzes so the OP cant expect anything from his first example. – 0___________ Sep 14 '18 at 23:56
-3

As noted by @P__J__, the right shift is implementation-dependent, so you should not rely on it to be consistent on different platforms.

As for your specific test, which is on a single platform (possibly 32-bit Intel or another platform that uses two's complement 32-bit representation of integers), but still shows a different behavior:

GCC performs operations on literal constants using the highest precision available (usually 64-bit, but may be even more). Now, the statement x = 0x80000000 >> 3 will not be compiled into code that does right-shift at run time, instead the compiler figures out both operands are constant and folds them into x = 0x10000000. For GCC, the literal 0x80000000 is NOT a negative number. It is the positive integer 2^31.

On the other hand, x = 0x80000000 will store the value 2^31 into x, but the 32-bit storage cannot represent that as the positive integer 2^31 that you gave as an integer literal - the value is beyond the range representable by a 32-bit two's complement signed integer. The high-order bit ends up in the sign bit - so this is technically an overflow, though you don't get a warning or error. Then, when you use x >> 3, the operation is now performed at run-time (not by the compiler), with the 32-bit arithmetic - and it sees that as a negative number.

Leo K
  • 5,189
  • 3
  • 12
  • 27
  • 1
    With 32-bit integers, 0x80000000 is an unsigned integer, so the shifts are unsigned. – Tom Karzes Sep 14 '18 at 21:26
  • @TomKarzes: actually, `0x80000000` isn't 32-bit or 64-bit (or any other size) - it is a literal constant that the compiler can treat as it pleases. BTW: by standard, it is a signed integer - if you want to tell the compiler to treat it as unsigned you'd have to do `0x80000000U`. (it was also shifted as a signed integer - simply, the compiler did that internally with 64-bit (or higher) math, so it ended up 0x10000000, as expected). – Leo K Sep 14 '18 at 21:36
  • @LeoK [Proof that it's unsigned](http://coliru.stacked-crooked.com/a/bde32e5e144d36b6) (given `sizeof(int) == 4 && CHAR_BIT == 8`). – HolyBlackCat Sep 14 '18 at 21:38
  • @LeoK I believe it's more tightly defined than that, but the rules are complicated. I don't have a copy of the standard handy, but according to [this post](https://stackoverflow.com/questions/11310456/is-the-integer-constants-default-type-signed-or-unsigned), for *hexadecimal* constants, the type is the first type the value will fit in, where the types are `int`, `unsigned int`, `long`, `unsigned long`, `long long`, `unsigned long long`. So for a 32-bit integer size, `0x80000000` has type `unsigned int`. – Tom Karzes Sep 14 '18 at 22:34
  • "GCC performs operations on literal constants using the highest precision available " is not correct or relevant; gcc follows the C standard. To "treat it as it pleases" would be non-conforming – M.M Sep 14 '18 at 22:58
  • Storing `0x80000000` in an `int` is not an overflow either, it's an out of range assignment. "Overflow" means the result of an arithmetic operation would be out of range. (Assignment is not arithmetic) – M.M Sep 14 '18 at 23:01