12

Can someone explain the following code output to me:

void myprint(unsigned long a)
{
    printf("Input is %lx\n", a);
}
int main()
{
    myprint(1 << 31);
    myprint(0x80000000);
}

output with gcc main.c :

Input is ffffffff80000000
Input is 80000000

Why is (1 << 31) treated as signed and 0x80000000 is treated as unsigned?

Mohit Jain
  • 30,259
  • 8
  • 73
  • 100
Saksham Jain
  • 537
  • 3
  • 14

5 Answers5

15

In C the result of an expression depends on the types of the operands (or some of the operands). Particularly, 1 is an int (signed), therefore 1 << n is also int.

The type (including signed-ness) of 0x80000000 is determined by the rules here and it depends on the size of int and other integer types on your system, which you haven't specified. A type is chosen such that 0x80000000 (a large positive number) is in range for that type.

In case you have any misconception: the literal 0x80000000 is a large positive number. People sometimes mistakenly equate it to a negative number, mixing up values with representations.

In your question you say "Why is 0x80000000 is treated as unsigned?". However your code does not actually rely on the signed-ness of 0x80000000. The only thing you do with it is pass it to the function which takes unsigned long parameter. So whether or not it is signed or unsigned doesn't matter; when passed to the conversion it is converted to an unsigned long with the same value. (Since 0x80000000 is within the minimum guaranteed range for unsigned long there is no chance of it being out of range).

So, that's 0x80000000 dealt with. What about 1 << 31 ? If your system has 32-bit int (or narrower) this causes undefined behaviour due to signed arithmetic overflow. (Link to further reading). If your system has larger ints then this will produce the same output as the 0x80000000 line.

If you use 1u << 31 instead, and you have 32-bit ints, then there is no undefined behaviour and you are guaranteed to see the program output 80000000 twice.

Since your output was not 80000000 then we can conclude that your system has 32-bit (or narrower) int, and your program actually causes undefined behaviour. The type of 0x80000000 would be unsigned int if int is 32-bit, or unsigned long otherwise.

Community
  • 1
  • 1
M.M
  • 138,810
  • 21
  • 208
  • 365
  • The signedness of 0x80000000 definitely does matter, and can bite if code like `flags &= ~0x0000000040000000;` (clear bit 30) or `flags &= ~0x0000000100000000;` (clear bit 32) needs to be modified to clear bit 30. The most obvious logical modification in either case, `flags &= ~0x0000000080000000;`, won't work on 32-bit systems, even though the code would work fine with the other two values. – supercat May 02 '16 at 16:43
  • @supercat my answer is talking about the code that OP posted. `0x80000000` is converted to `unsigned long` , and not any other operation. – M.M May 02 '16 at 19:57
  • If 0x80000000 had been promoted to signed long, the output of the program would have been different. – supercat May 02 '16 at 20:11
  • @supercat there is no promotion involved here. I guess you mean "If 0x80000000 had type `signed long`...", however the output of the program would be the same in that case. – M.M May 02 '16 at 20:29
  • @M.M can you confirm whether my answer(posted below) is correct. I think 0x80000000 is first converted to unsigned and then converted to unsigned long when passed as argument to myprint function. – Saksham Jain May 03 '16 at 04:31
  • @SakshamJain `0x80000000` is `unsigned int` already on your system. It is not "converted to unsigned". When you call the function, unsigned int is converted from unsigned int to unsigned long, which does not change its value. – M.M May 03 '16 at 04:35
6

Why is (1 << 31) treated as signed and 0x80000000 is treated as unsigned?

From 6.5.7 Bitise shift operators in C11 specs:

3 The integer promotions are performed on each of the operands. The type of the result is that of the promoted left operand. [...]
4 The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros. If E1 has an unsigned type, the value of the result is E1 × 2E2, reduced modulo one more than the maximum value representable in the result type. If E1 has a signed type and nonnegative value, and E1 × 2E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined

So, because 1 is an int (From section 6.4.4.1 mentioned in following paragraph), 1 << 31 is also an int for which the value is not well defined on systems where int is less than or equal to 32 bits. (May even trap)


From 6.4.4.1 Integer constants

3 A decimal constant begins with a nonzero digit and consists of a sequence of decimal digits. An octal constant consists of the prefix 0 optionally followed by a sequence of the digits 0 through 7 only. A hexadecimal constant consists of the prefix 0x or 0X followed by a sequence of the decimal digits and the letters a (or A) through f (or F) with values 10 through 15 respectively.

and

5 The type of an integer constant is the first of the corresponding list in which its value can be represented.

Suffix   |           decimal Constant         |   Hex Constant
---------+------------------------------------+---------------------------
none     |       int                          |  int
         |       int                          |  unsigned int
         |                                    |  long int
         |       long int                     |  unsigned long int
         |                                    |  long long int
         |       long long int                |  unsigned long long int
---------+------------------------------------+---------------------------
u or U   |       unsigned int                 |  unsigned int
[...]    |       [...]                        |  [...]

So, 0x80000000 on a system with 32 bit or lesser bits int and 32 bit or larger unsigned int is an unsigned int,

Mohit Jain
  • 30,259
  • 8
  • 73
  • 100
  • "Not well defined" is an euphemism. It can make nasal daemons appear. – too honest for this site May 08 '16 at 14:14
  • You are right @Olaf. At the time of writing I was not sure whether the result of `signed int`egral type overflow is unspecified or undefined. All I knew was `not defined` and that it may trap. (It actually traps on some systems). Moreover my answer focusses on **type** than **value**. After consulting a little more, I believe it is undefined. – Mohit Jain May 09 '16 at 06:41
  • The standard is very clear about that. Just out of curiosity: On which existing systems/architecture do overflows of the standard integer types and their operations actually generate an exception? I suppose that's what you mean with "trap". A _trap representation_ is a bit different, but I'd be curious to hear which systems do this, too. – too honest for this site May 09 '16 at 11:10
  • http://stackoverflow.com/questions/16634110/difference-between-add-and-addu. Don't know which compilers practice this actually. But gcc has an option ([`ftrapv`](https://gcc.gnu.org/onlinedocs/gcc-4.0.0/gcc/Code-Gen-Options.html)) to enable trapping. – Mohit Jain May 09 '16 at 11:13
  • The 68K also had a trapv instruction. But that is the machines, not about C. The gcc option is more interesting (just forgot about that), but I'm not sure if that really supports all architectures. Some might not even flag overflow, so you had to detect it in advance, requiring a lot of code. – too honest for this site May 09 '16 at 11:39
  • @Olaf Fully agreed. Would prefer a safe_addition, safe_multiplication etc when I really care about overflow. (I needed it twice in production code, both times in calculation inside binary search to clamp the result in -MAX, +MAX. Once I implemented using operation in wider type and second time using pre-condition checking on arguments) – Mohit Jain May 09 '16 at 11:53
  • 1
    Some CPUs have saturated add/sub. But mostly for DSP algorithms. – too honest for this site May 09 '16 at 11:57
2

You apparently use a system with 32 bit int and unsigned int.

1 fits into an int, thus it is a signed int, 0x80000000 does not. While for decimal constants, the next larger signed type would be used which can hold that value, for hexadecimal and octal constants, first the corresponding unsigned type is used, if that fits. This because they are commonly used unsigned anyway. See the C standard, 6.4.4.1p5 for a complete value/type matrix.

For signed integers, left shift with changing the sign is undefined behaviour. This implies all bets are off because you are beyond the language specification.

Said that, the following is an interpretation of the results:

  • long is apparently 64 bits on your system.
  • The int shifted the 1 into the sign-bit as you might have expected.
  • This results in a negative int.
  • Negative ints are converted to unsigned such that a 2's complement representation does not need any operations (just reinterpretation of the bit-pattern)
  • As you use 64 bit unsigned long, the sign is extended to the upper bits for for the argument to myprint.

How to avoid it:

  • Always use unsigned integers when shifting (e.g. append U suffix to integer constants where appropriate, here: 1U, or 0x1U).
  • Be aware about the standard integer conversions when using smaller types than int.
  • In general, if you need a specific size, you definitively should use stdint.h fixed width types. Note that the standard integer types have no defined bitwidth. For 32 bit, use uint32_t for variables. For constants, use the macros: UINT32_C(1) (without suffix!).
too honest for this site
  • 12,050
  • 4
  • 30
  • 52
1

My thought: The argument to the first call to 'myprint()' is an expression, so has to be calculated at runtime. So the compiler is required to interpret it (via generated instructions) as a signed int left-shift, producing a negative signed int, which is then sign-extended to fill long, then reinterpreted as unsigned long. (I think that this might be a compiler error?)

By contrast, the second call to 'myprint()' is a hard-coded integer constant expression, being passed to a routine taking unsigned long as argument; I think the compiler is written to assume from this context that the constant expression is already an unsigned long due to there being no overt conflicting type information.

PMar
  • 11
  • 1
0

Correct me if I am wrong. This is what I have understood.

On my machine, as M.M said, sizeof(int) = 4. (Confirmed by printing sizeof(int))

So, 1 << 31 becomes (signed)0x80000000 as 1 is signed. But, 0x8000000 becomes unsigned as it can't fit in signed int (Because it is treated as positive and max positive by int can be 0x7fffffff).

So when a signed int is converted to long, then sign extension takes place (extension takes place using sign bit). And when unsigned int is converted, it is extended using 0's.

So that is why there are extra 1's in case of myprint(1 << 31) and this is not the case in either

1) myprint(1u << 31)

2) myprint(1 << 31) when int > 32 bits because in that case the sign bit is not 1.

Saksham Jain
  • 537
  • 3
  • 14
  • `1 << 31` is undefined behaviour, you cannot rely on or infer anything from what happens after that. I prefer not to speculate further here because people read it and think to themself "it's not really undefined, the real story is bla bla bla and I can rely on that". – M.M May 03 '16 at 04:38
  • @M.M Thanks. I just got it. Should had read the link you provided earlier http://stackoverflow.com/questions/26319592/1-31-produces-the-error-the-result-of-the-expression-is-undefined> Thanks a ton! – Saksham Jain May 03 '16 at 04:48
  • @M.M I doubt I had. The result of E1 << E2 is undefined if E1 × 2^E2 is unrepresentable in the result type. So does this means when compiler left shift by one to multiply by 2, it is taking a chance that the behaviour might be undefined? (Overflow is still well-defined behaviour right?) – Saksham Jain May 04 '16 at 07:02
  • signed overflow is undefined. Left shift and multiplying by 2 may cause undefined behaviour due to overflow. – M.M May 04 '16 at 07:15