3

Nowadays compilers optimize crazy things. Especially gcc and clang sometime do really crazy transformations.

So, I'm wondering why the following piece of code is not optimized:

#include <stdio.h>
#include <string.h>

int main() {
    printf("%d\n", 0);
}

I would expect a heavy-optimizing compiler to generate code equivalent to this:

#include <stdio.h>
#include <string.h>

int main() {
    printf("0\n");
}

But gcc and clang don't apply the format at compile-time (see https://godbolt.org/z/Taa44c1n7). Is there a reason why such an optimization is not applied?

I see regularly that some format-arguments are known at compile-time, so I guess it could make sense (especially for floating-point values, because there the formatting is probably relatively expensive).

TylerH
  • 20,799
  • 66
  • 75
  • 101
Kevin Meier
  • 2,339
  • 3
  • 25
  • 52
  • 7
    Probably because no one has bothered writing such an optimization yet? That said, I don't think real-world programs have too many bottlenecks of this sort... – AKX Mar 23 '21 at 11:03
  • 4
    Maybe such constructs do not occur often enough to be worth the effort. – Jonathan Leffler Mar 23 '21 at 11:03
  • 6
    `printf("%s\n", s)` is optimized to `puts(s)`, so known format strings are optimized to some extent. – interjay Mar 23 '21 at 11:03
  • 1
    https://en.wikipedia.org/wiki/Hindu%E2%80%93Arabic_numeral_system#Glyph_comparison – Hans Passant Mar 23 '21 at 13:06
  • 2
    See also [Why doesn't GCC optimize this call to printf?](https://stackoverflow.com/questions/37435984/why-doesnt-gcc-optimize-this-call-to-printf) and [`gimple_fold_builtin_printf `](https://github.com/gcc-mirror/gcc/blob/master/gcc/gimple-fold.c#L3690). The latter shows that only simple string and char formats are optimized to `puts` and `putchar`. – dxiv Mar 23 '21 at 13:29

4 Answers4

2

Suppose that at various spots within a function one has the following four lines:

printf("%d\n", 100);
printf("%d\n", 120);
printf("%d\n", 157);
printf("%d\n", 192);

when compiling for a typical ARM microcontroller, each of those printf call would take eight bytes, and all four function calls would share four bytes that are used to hold a copy of the address of the string, and four bytes for the string itself. The total for code and data would thus be 40 bytes.

If one were to replace that with:

printf("100\n");
printf("120\n");
printf("157\n");
printf("192\n");

then each printf call would only need six bytes of code, but would need to have its own separate five-byte string in memory, along with four bytes to hold the address. The total code plus data requirement would thus increase from 40 bytes to 60. Far from being an optimization, thus would actually increase code space requirement by 50%.

Even if there were only one printf call, the savings would be minuscule. Eight bytes for the original version of the call plus eight bytes of data overhead would be 16 bytes. The second version of the code would require eight bytes of code plus seven bytes of data, for a total of 17 bytes. An optimization which would save a little storage in best case, and cost a lot in scenarios that are hardly implausible isn't really an optimization.

supercat
  • 77,689
  • 9
  • 166
  • 211
2

Why are compile-time known format-strings not optimized?

Some potential optimizations are easy for compilers/compiler developers to support and have large impact on the quality/performance of the code the compiler generates; and some potential optimizations are extremely difficult for compilers/compiler developers to support and have a small impact on the quality/performance of the code the compiler generates. Obviously compiler developers are going to spend most of their time on the "easier with higher impact" optimizations, and some of the "harder with less impact" optimizations are going to be postponed (and possibly never implemented).

Something like optimizing printf("%d\n", 0); into a puts() (or better, an fputs()) looks like it'd be relatively easy to implement (but would have a very small performance impact, partly because it'd be rare for that to occur in source code anyway).

The real problem is that it's a "slippery slope".

If compiler developers do the work needed for the compiler to optimize printf("%d\n", 0);, then what about optimizing printf("%d\n", x); too? If you optimize those then why not also optimize printf("%04d\n", 0); and printf("%04d\n", x);, and floating point, and more complex format strings? Surely printf("Hello %04d\n Foo is %0.6f!\n", x, y); could be broken down into a series of smaller functions (puts(), atoi(), ..)?

This "slippery slope" means that the complexity increases rapidly (as the compiler supports more permutations of format strings) while the performance impact barely improves at all.

If compiler developers are going to spend most of their time on the "easier with higher impact" optimizations, they're not going to be enthusiastic about clawing further up that particular slippery slope.

Brendan
  • 35,656
  • 2
  • 39
  • 66
0

How could the compiler know what kind of printf() function you will link?

There is a standard meaning of this function, but nobody forces you to use the standard library.
So it's beyond the compiler's competence to do in advance what the function is supposed to do.

CiaPan
  • 9,381
  • 2
  • 21
  • 35
  • 5
    Actually I believe the compiler is allowed to assume the standard library, unless you use `-ffreestanding`. For example, when you call memset or memcpy or strlen, it doesn't actually have to call memset or memcpy or strlen. – user253751 Mar 23 '21 at 11:06
  • 4
    Yet the compiler (both gcc and clang) manages to optimize `printf("%s\n", s)` to `puts(s)`. The compiler can know what `printf` does because it's part of the standard library. – interjay Mar 23 '21 at 11:06
  • 6
    C 2018 7.1.3 1 says, of the standard library routines, “All identifiers with external linkage in any of the following subclauses (including the future library directions) and `errno` are always reserved for use as identifiers with external linkage.” This means `printf` with external linkage is reserved for the use of the C implementation; if you compile a program using `printf` with external linkage, the C implementation may assume it refers to the library function. (You can use `printf` for your own purposes as a local identifier with no linkage or internal linkage.) – Eric Postpischil Mar 23 '21 at 11:54
  • @user253751 it's worth saying that memset() memcpy() strlen() and so on only cope with internal representation of data; printf() is different, I think. Particurarly, a `"printf("\n")` should output a newline on unix, and a CR/LF on other systems. – linuxfan says Reinstate Monica Mar 23 '21 at 16:23
  • @linuxfansaysReinstateMonica The compiler is free to use any mechanism to output that LF or CRLF – user253751 Mar 23 '21 at 16:39
-1

My guess is that how printf() does the conversion from the internal C format to the user representation is a matter of operating system, with its locale conventions (often/always configurable?) so it would be wrong to assume that to print an "integer 0" one should use the ascii char 48 (extreme example, I know). Particularly, and here I may be wrong, a printf("\n") outputs a newline on unix, and a CR+LF on other systems. I am not sure about where this transformation is done and, to be sincere, I think this is a glitch: I've seen, more than once, programs which outputted the wrong line termination, and I think this is because the C compiler should not code a line termination character into the program. This is the same as assuming that a particular number has to be written in a particular way.

Little addition: below this question there's a comment saying that it is possible to change at runtime the behavior of format specifiers. If this is true, then we have the final reply to the question... can someone confirm?

  • The compiler knows which architecture it's compiling for, so this isn't an issue. It can output CR or CRLF accordingly. – interjay Mar 23 '21 at 18:15