Is this a bug in glibc printf?

Question

Using stdint.h from glibc (gcc SUSE Linux version 9.2.1, Intel Core I7 processor) I came across a most strange behaviour when printing INT32_MIN directly:

#include <stdio.h>
#include <stdint.h>

void main(void)
{
    printf("%d\n", INT16_MIN);
    int a = INT16_MIN;
    printf("%d\n", a);

    printf("%ld\n", INT32_MIN);
    long b = INT32_MIN;
    printf("%ld\n", b);

    printf("%ld\n", INT64_MIN);
    long c = INT64_MIN;
    printf("%ld\n", c);
}

which outputs:

-32768
-32768
2147483648
-2147483648
-9223372036854775808
-9223372036854775808

Furthermore, if I try

printf("%ld\n", -INT32_MIN);

I get the same result, but with compiler warning: integer overflow in expression '-2147483648' of type 'int' results in '-2147483648' [-Woverflow].

Not that this is incredibly bad for any existing program, actually it seems pretty harmless, but is this a bug in good old printf?

But it works when I print from `long b`, also there's the compiler warning, which still persists when using `%d` for `INT32_MIN`. — Arc, Nov 19 '21 at 22:54
Ah, ok, you guys are right, it's weird, but its undefined per standard, the `%d` works fine, thanks! — Arc, Nov 19 '21 at 23:00
Note that `-INT32_MIN` is undefined behavior too: signed integer overflow. — Nate Eldredge, Nov 20 '21 at 00:09
PSA: For a modern compiler, "it's weird" and "it's undefined" are pretty much synonymous. — Steve Summit, Nov 20 '21 at 00:51
@NateEldredge: great, thanks! This completes the explanation provided in the answer below, on why the compiler raises an overflow warning but results in the same value: `-INT32_MIN` overflows up from -2,147,483,468 to +2,147,483,468, which warps again to -2,147,483,468, and is thus printed +2,147,483,468, according to Eric's answer below :) — Arc, Nov 20 '21 at 02:01
Although not a standard-required diagnostic, every gcc I've used for at least a decade warns when (constant) format specifier doesn't match the data; for your line with INT32_MIN I get `format ‘%ld’ expects argument of type ‘long int’, but argument 2 has type ‘int’ [-Wformat=]`. This should have been enough to tell you that INT32 is int not long, and printing an int value with a long format isn't safe. — dave_thompson_085, Nov 20 '21 at 06:54
Hi @dave_thompson_085, thanks for checking, mine is gcc (SUSE Linux) 9.2.1 20190820, and I don't get that warning _unless_ I specify it explicitly, probably it's a setting on my profile or else, in any case you are right, that warning should be enabled for all compilations. Will review my makefiles now, thanks. — Arc, Nov 20 '21 at 15:23

Eric Postpischil · Accepted Answer · 2021-11-20T01:55:46.943

2

Is this a bug in glibc printf?

No.

printf("%ld\n", INT32_MIN); … 2147483648

There is an easy way for this to happen. The second integer/pointer argument to a function should be passed in 64-bit register RCX. INT32_MIN is a 32-bit int with bit pattern 0x80000000, since that is the two’s complement pattern for −2,147,483,648. The rules for passing a 32-bit value in a 64-bit register are that it is passed in the low 32 bits, and the high bits are not used by the called routine. For this call, 0x80000000 was put into the low 32 bits, and the high bits happened to be set to zero.

Then printf examines the format string and expects a 64-bit long. So it goes looking in RCX for a 64-bit integer. Of course, the rules for passing a 64-bit integer are to use the entire register, so printf takes all 64 bits, 0x0000000080000000. That is the bit pattern for +2,147,483,468, so printf prints 2147483648.

Of course, the C standard does not define the behavior, so other things could happen, but this is a likely scenario for what did happen in the instance you observed.

printf("%d\n", INT16_MIN); … -32768

Since int is 32 bits in your C implementation, the int16_t value INT16_MIN is automatically promoted to int for the function call, so this passes an int, and %d expects an int, so there is no mismatch, and the correct value is printed.

Similarly, the other printf calls in the question have arguments that match the conversion specifications (given the particular definitions of int16_t and such in your C implementation; they could mismatch in others), so their values are printed correctly.

edited Nov 20 '21 at 01:55

answered Nov 19 '21 at 23:47

Eric Postpischil

195,579
13
168
312

"and the high bits happened to be set to zero.": Which, for OP's information, is particularly likely, since most instructions that would populate the low 32 bits (`ECX`) have the side effect of zeroing the high 32 bits: https://stackoverflow.com/questions/11177137/why-do-x86-64-instructions-on-32-bit-registers-zero-the-upper-part-of-the-full-6 – Nate Eldredge Nov 20 '21 at 00:12
Fantastic... that would never have occurred to me: the compiler parses the `INT32_MAX` contents in the context of a 32-bit integer (since no `L` or `LL` modifiers specified) and generates the bit pattern you mentioned, but stores it unconverted to a 64-bit register (since no one ever specified what it should do in that case), which in this particular case results in just the absolute (positive) value of the (negative) original. Explains why the other cases worked fine, and this particular one did not. A lesson learned on the subtleties of 'undefined behaviour', thanks for the fine answer. – Arc Nov 20 '21 at 01:53
In my comment above I meant `INT32_MIN`, not `INT32_MAX`. – Arc Nov 20 '21 at 01:59
To complete the answer, @NateEldredge provided a comment on my question explaining `-INT32_MIN` is also undefined behaviour, and thus explains the overflow warning but with same print result: `-INT32_MIN` is -(-2,147,483,468) = +2,147,483,468, which warps to -2,147,483,468, and is thus printed +2,147,483,468 according to your answer. – Arc Nov 20 '21 at 02:06
As commented above, compiler warning `-Wformat` should be enabled, as it flags this conversion issue. – Arc Nov 20 '21 at 15:28

adnan_e · Answer 2 · 2021-11-20T11:32:59.983

Let's observe this specific snippet

    long a = INT32_MIN;
    printf("%ld\n", a);          // #1
    printf("%ld\n", INT32_MIN);  // #2

It prints the following (as per your example, I haven't tried it myself, it's implementation dependent, as I'll note later):

-2147483648
2147483648

Now, in #1 we are passing a long to printf. The function works with a variadic argument list and based on the format specified, the value of our parameter will be interpreted as a signed long, and thus we end up with the value -2147483648.

For #2 we have a somewhat different situation. If you look at how INT32_MIN is defined, you'll see that it's a plain old C macro, for example:

# define INT32_MIN      (-2147483647-1)

so, our line

printf("%ld\n", INT32_MIN);

is actually

printf("%ld\n", (-2147483647-1));

Where the argument type isn't a long, but an integer (see this for details)! Compare it to this

    int b = INT32_MIN;
    printf("%ld\n", b);

And it prints out the same value you observed (positive).

Now it boils down to what your question should actually be:

What happens when printf receives a value of a type that isn't in the format specifier?

Undefined behavior!

Ah, ok, you guys are right, it's weird, but its undefined per standard, the `%d` works fine, thanks! — Arc, Nov 19 '21 at 23:00
`printf` in `printf("%ld\n", a);` does not accept a `void *`, nor is there one in the arguments. — Eric Postpischil, Nov 19 '21 at 23:37
My bad, I wrongly assumed variadic arguments would be interpreted as void* and the function impl would be responsible for the interpretation. — adnan_e, Nov 20 '21 at 11:30

Is this a bug in glibc printf?

2 Answers2

Linked