1

Take the C language as an example for this question. In the C language we can see many format specifiers, such as %i %d %c %s %f etc. In the compilation process, the C code is converted into an assembly code.

e.g. 1

char *a = "Hello World !\n";

and the assembly code is :

.LC0:
        .string "Hello world !", 10

e.g. 2

printf("Hello %s", "World !");

and the assembly code is :

.LC0:
        .string "World !"
.LC1:
        .string "Hello %s"

and the output is :

Hello World !

And how does %s format specifier works in compilation process?

  • 4
    The format specifiers are not handled by the compiler but by the library functions such as `printf` and friends. – Jabberwocky Dec 19 '20 at 14:08
  • 3
    It doesn't, although the compiler might check it against the arguments provided, or replace the whole thing with `puts()`: it is interpreted at run-time by the formatting function. – Weather Vane Dec 19 '20 at 14:09
  • 1
    If you want to know how it works exactly, have a look at some printf implementations, you can find this easily on the internet, but be aware that this is not something I'd recommend to beginners. – Jabberwocky Dec 19 '20 at 14:10
  • 1
    ...so there is a high run-time cost in time and memory, for these reasons `printf()` function family is omitted from some embedded libraries by default. – Weather Vane Dec 19 '20 at 14:12
  • I suspect that the real question you have is how is a `declarative statement` converted to `imperative code`. Try searching for that on the Internet. Another similar question would be `How is SQL converted to code?` – Guy Coder Dec 22 '20 at 11:28

2 Answers2

3

The string passed to printf is a run-time value. I'm not aware of any compilers that "pre-compile" a printf statement that just happens to have only const values being supplied. At best they validate that the arguments you supply are valid.

The C language is unlikely to expand to include things like Template Literals from JavaScript or even std::format from C++, but if it did, there would be an opportunity to do compile-time analysis and possible optimization.

tadman
  • 208,517
  • 23
  • 234
  • 262
  • 3
    Actually, it is only part of the truth. Many compilers parse the format literals and check the parameters if they match. So in many implementations compiler is also involved in the process and often some simple compile-time optimizations are performed : https://godbolt.org/z/sa9Pz6 – 0___________ Dec 19 '20 at 15:12
  • 1
    You can also create your own printf-like functions and let the compiler to do the same checks if you specify some pragmas or attributes. – 0___________ Dec 19 '20 at 15:17
  • @P__JsupportswomeninPoland: indeed, that optimization still happens at `-O0` if you don't hide it behind a separate variable. https://godbolt.org/z/3fsvMn (`-O0` treats everything sort of like volatile). You'd need `-fno-builtin-printf` to make GCC always pass the args specified to the printf library function. There are many Q&As about printf being treated as a compiler built-in, e.g. [-O2 optimizes printf("%s\n", str) to puts(str)](https://stackoverflow.com/q/36343733) and a good canonical with many links: [Compiler changes printf to puts](https://stackoverflow.com/q/60080021) – Peter Cordes Dec 19 '20 at 17:34
  • @PeterCordes as I wrote: compiler parses the string literal (or even variable in some circumstances) https://godbolt.org/z/adnKcq. `-O0 treats everything sort of like volatile` no - it simply does not optimize, but volatile means something completely different https://godbolt.org/z/aq79aY – 0___________ Dec 19 '20 at 19:01
  • @P__JsupportswomeninPoland: I was agreeing with you and adding links to further explanations. Re: volatile: the difference for multiple references within one statement is why I said *sort of*. If you think of straightforward hand-written asm as "normal" where local vars can live in registers, storing and reloading C vars between statements is an extra anti-optimization at `-O0` to ensure consistent debugging. [Why does clang produce inefficient asm with -O0 (for this simple floating point sum)?](https://stackoverflow.com/q/53366394). – Peter Cordes Dec 19 '20 at 19:47
  • @P__J: But sure, you could equivalently describe it as not doing any constant-propagation or other optimization between statements. 2 ways of describing the same reality that we both understand. BTW, GCC can't "not optimize"; it always transforms through GIMPLE on the way from source to asm, so `-O0` is [just the minimum needed to make asm](https://stackoverflow.com/a/33284629). And GCC -O0 certainly does still optimize within single statements, e.g. still using a multiplicative inverse for `x /= 10;`. – Peter Cordes Dec 19 '20 at 19:50
  • @PeterCordes modern debuggers (including gdb) do not have any problems with the variables stored only in the registers. So for the debugging, storing to the memory location is not needed anymore (unless the variable was completely optimized out, which is a completely different case). `GCC can't "not optimize"` It was a shorter version of : with the -O0 optimization. – 0___________ Dec 19 '20 at 21:22
  • @P__JsupportswomeninPoland: GNU's idea of "consistent debugging" includes *changing* the value of a C variable while stopped at a breakpoint and have execution continue as if you did that in the C abstract machine. Also, you can use GDB's `jump` command to continue execution at a different line within the same function, and again execution should proceed as if you did that in the C abstract machine (within limits, like skipping a VLA declaration might be a problem). So every statement is compiled to an independent block of code with everything except `register` variables spilled to memory. – Peter Cordes Dec 19 '20 at 21:28
1

The compilation process just does nothing with the strings (or the format specifiers)

As you see, both strings are just copied into the .rodata segment literally as written, and pointers to them are passed to the printf(3) function. Is this function that handles (and interprets) the contents of the first string to decide how many and what type of arguments you are passing to it. The compiler does nothing in this case, but placing the strings in some area, and pass the pointers to them to the function.

Luis Colorado
  • 10,974
  • 1
  • 16
  • 31