164

For a school project, I've to code the C function printf. Things are going pretty well, but there is one question I can't find a good answer to, so here I am.

printf("PRINTF(d) \t: %d\n", -2147483648);

tells me (gcc -Werror -Wextra -Wall):

   error: format specifies type 'int' but the argument has type 'long'
      [-Werror,-Wformat]
        printf("PRINTF(d) \t: %d\n", -2147483648);
                              ~~     ^~~~~~~~~~~
                              %ld

But if I use an int variable, everything is going well:

int i;

i = -2147483648;
printf("%d", i);

Why?

EDIT:

I understood many points, and they were very interesting. Anyway, I guess printf is using the <stdarg.h> librairy and so, va_arg(va_list ap, type) should also return the right type. For %d and %i, obviously the type returned is an int. Does it change anything?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Arthur Thévenot
  • 1,033
  • 2
  • 7
  • 14
  • To your additional question which I already answered, but then my comment pertaining that was deleted: `va_arg()` doesn't know what type the argument you try to fetch has. You need to know that and if you try to fetch a different type than what was passed as an argument, that's undefined behaviour. This also applies if you do `printf("%d\n", -2147483648)` as the argument has type `long` but `printf` tries to fetch an `int`. – fuz Jan 19 '16 at 07:40
  • Not a duplicate. If you read the accepted answer on the other question, it is due to undefined behavior specific to the question's context. This question is not about undefined behavior. This question is about C and the other is about C++. Both languages have similar rules for promotion, but there can be subtle differences. This question will help more future visitors, and perhaps already has if the much higher vote counts are any indication. – Adrian McCarthy Mar 01 '16 at 19:32
  • Also not a duplicate of the second question. The key fact there is that they specified the literal in hex, which means it's unsigned int rather than signed long. – Adrian McCarthy Mar 01 '16 at 19:34
  • 1
    @AdrianMcCarthy *they specified the literal in hex, which means it's unsigned int rather than signed long.* That can be read as "hex constants are always unsigned". Old pre-standard C compilers often made hex constants unsigned, so that might be confusing. Per [**6.4.4.1 Integer constants**, paragraph 5](https://port70.net/~nsz/c/c11/n1570.html#6.4.4.1p5): "The type of an integer constant is the first of the corresponding list in which its value can be represented." In this case, `0x80000000` is an `unsigned int` here because it fits in an `unsigned int` but is too big for `[signed] int`. – Andrew Henle Jul 17 '18 at 19:48
  • @AdrianMcCarthy I rewrote my comment to make it clear I was just trying to clarify, and deleted my previous comment – Andrew Henle Jul 17 '18 at 19:49

2 Answers2

231

In C, -2147483648 is not an integer constant. 2147483648 is an integer constant, and - is just a unary operator applied to it, yielding a constant expression. The value of 2147483648 does not fit in an int (it's one too large, 2147483647 is typically the largest integer) and thus the integer constant has type long, which causes the problem you observe. If you want to mention the lower limit for an int, either use the macro INT_MIN from <limits.h> (the portable approach) or carefully avoid mentioning 2147483648:

printf("PRINTF(d) \t: %d\n", -1 - 2147483647);
fuz
  • 88,405
  • 25
  • 200
  • 352
60

The problem is that -2147483648 is not an integer literal. It's an expression consisting of the unary negation operator - and the integer 2147483648, which is too big to be an int if ints are 32 bits. Since the compiler will choose an appropriately-sized signed integer to represent 2147483648 before applying the negation operator, the type of the result will be larger than an int.

If you know that your ints are 32 bits, and want to avoid the warning without mutilating readability, use an explicit cast:

printf("PRINTF(d) \t: %d\n", (int)(-2147483648));

That's defined behaviour on a 2's complement machine with 32-bit ints.

For increased theoretical portability, use INT_MIN instead of the number, and let us know where you found a non-2's-complement machine to test it on.


To be clear, that last paragraph was partly a joke. INT_MIN is definitely the way to go if you mean "the smallest int", because int varies in size. There are still lots of 16-bit implementations, for example. Writing out -231 is only useful if you definitely always mean precisely that value, in which case you would probably use a fixed-sized type like int32_t instead of int.

You might want some alternative to writing out the number in decimal to make it clearer for those who might not notice the difference between 2147483648 and 2174483648, but you need to be careful.

As mentioned above, on a 32-bit 2's-complement machine, (int)(-2147483648) will not overflow and is therefore well-defined, because -2147483648 will be treate as a wider signed type. However, the same is not true for (int)(-0x80000000). 0x80000000 will be treated as an unsigned int (since it fits into the unsigned representation); -0x80000000 is well-defined (but the - has no effect if int is 32 bits), and the conversion of the resulting unsigned int 0x80000000 to int involves an overflow. To avoid the overflow, you would need to cast the hex constant to a signed type: (int)(-(long long)(0x80000000)).

Similarly, you need to take care if you want to use the left shift operator. 1<<31 is undefined behaviour on 32-bit machines with 32-bit (or smaller) ints; it will only evaluate to 231 if int is at least 33 bits, because left shift by k bits is only well-defined if k is strictly less than the number of non-sign bits of the integer type of the left-hand argument.

1LL<<31 is safe, since long long int is required to be able to represent 263-1, so its bit size must be greater than 32. So the form

(int)(-(1LL<<31))

is possibly the most readable. YMMV.


For any passing pedants, this question is tagged C, and the latest C draft (n1570.pdf) says, with respect to E1 << E2, where E1 has a signed type, that the value is defined only if E1 is nonnegative and E1 × 2E2 "is representable in the result type". (§6.5.7 para 4).

That's different from C++, in which the application of the left-shift operator is defined if E1 is nonnegative and E1 × 2E2 "is representable in the corresponding unsigned type of the result type" (§5.8 para. 2, emphasis added).

In C++, according to the most recent draft standard, the conversion of an integer value to a signed integer type is implementation-defined if the value cannot be represented in the destination type (§4.7 para. 3). The corresponding paragraph of the C standard -- §6.3.1.3 para. 3 -- says that "either the result is implementation-defined or an implementation-defined signal is raised".)

rici
  • 234,347
  • 28
  • 237
  • 341
  • 2
    The conventional way to define `INT_MIN` is `#define INT_MIN (-(INT_MAX)-1)`, (which avoids the problem of trying to take the negative of a long that you described. [*(reference)*](http://www.hardtoc.com/archives/119) – abelenky Jan 11 '16 at 23:33
  • 2
    @abelenky: I know that is how `INT_MIN` is conventionally defined *on 2s complement machines*. Otherwise, I suppose it would make sense to define it as `(-INT_MAX)` because it makes sense to only type the number 2147483647 once. But it was (reasonably) suggested that calling printf with `-2147483647-1` is a little wierd. Of course, now that we've had this discussion, we know why you have to do that. I only suggested that using an explicit cast lets you write the literal number in the form it would normally be written by those who are neither compilers nor language lawyers :) – rici Jan 11 '16 at 23:40
  • http://hardtoc.com/2009/07/16/int-min.html new URL for that (reference) link – Leland Oct 30 '22 at 21:33