28

The question is plain and simple, s is a string, I suddenly got the idea to try to use printf(s) to see if it would work and I got a warning in one case and none in the other.

char* s = "abcdefghij\n";
printf(s);

// Warning raised with gcc -std=c11: 
// format not a string literal and no format arguments [-Wformat-security]

// On the other hand, if I use 

char* s = "abc %d efg\n";
printf(s, 99);

// I get no warning whatsoever, why is that?

// Update, I've tested this:
char* s = "random %d string\n";
printf(s, 99, 50);

// Results: no warning, output "random 99 string".

So what's the underlying difference between printf(s) and printf("%s", s) and why do I get a warning in just one case?

Sourav Ghosh
  • 133,132
  • 16
  • 183
  • 261
Mikael
  • 969
  • 12
  • 24
  • Really surprising, and very interesting. I confirm this behavior, there could be an explanation but until someone explains this I consider it a bug in the diagnostic. – Iharob Al Asimi Sep 09 '16 at 16:02
  • Think a little while what could happen if you have the second string, with an embedded format code, and don't pass an argument. Then read about [*undefined behavior*](https://en.wikipedia.org/wiki/Undefined_behavior). – Some programmer dude Sep 09 '16 at 16:02
  • 1
    @JoachimPileborg, so you are saying that by not providing a string literal, the compiler has no way to know how many arguments will be needed? So that's why if I provide no more arguments I get a warning, but if I provide at least one, I don't. I've added one more example on the question and I guess it supports what I said. – Mikael Sep 09 '16 at 16:06
  • 2
    You may notice a difference if you use `const char* const s`, too. – aschepler Sep 09 '16 at 16:10
  • Just don't pass a variable as the first argument to printf. don't do it. it's a security whole – Ryan Sep 09 '16 at 16:12
  • 1
    Another difference is compile time analyzabilty and the optimizations then applied - or can not. – chux - Reinstate Monica Sep 09 '16 at 16:42
  • 6
    The fundamental difference is that `printf(s)` is a bug waiting to happen (and a potential [security hole](https://www.owasp.org/index.php/Format_string_attack)), whereas `printf("%s", s)` is just an inefficient way to write `fputs(s, stdout)`. – Ilmari Karonen Sep 09 '16 at 22:24
  • Security. As others have mentioned, if s comes from the user or some varying source it has the potential to have % formatters in it that you probably aren't expecting, leading to a crash, the second version mitigates this. – pilkch Sep 10 '16 at 10:12

5 Answers5

26

In the first case, the non-literal format string could perhaps come from user code or user-supplied (run-time) data, in which case it might contain %s or other conversion specifications, for which you've not passed the data. This can lead to all sorts of reading problems (and writing problems if the string includes %n — see printf() or your C library's manual pages).

In the second case, the format string controls the output and it doesn't matter whether any string to be printed contains conversion specifications or not (though the code shown prints an integer, not a string). The compiler (GCC or Clang is used in the question) assumes that because there are arguments after the (non-literal) format string, the programmer knows what they're up to.

The first is a 'format string' vulnerability. You can search for more information on the topic.

GCC knows that most times the single argument printf() with a non-literal format string is an invitation to trouble. You could use puts() or fputs() instead. It is sufficiently dangerous that GCC generates the warnings with the minimum of provocation.

The more general problem of a non-literal format string can also be problematic if you are not careful — but extremely useful assuming you are careful. You have to work harder to get GCC to complain: it requires both -Wformat and -Wformat-nonliteral to get the complaint.

From the comments:

So ignoring the warning, as if I really know what I am doing and there will be no errors, is one or another more efficient to use or are they the same? Considering both space and time.

Of your three printf() statements, given the tight context that the variable s is as assigned immediately above the call, there is no actual problem. But you could use puts(s) if you omitted the newline from the string or fputs(s, stdout) as it is and get the same result, without the overhead of printf() parsing the entire string to find out that it is all simple characters to be printed.

The second printf() statement is also safe as written; the format string matches the data passed. There is no significant difference between that and simply passing the format string as a literal — except that the compiler can do more checking if the format string is a literal. The run-time result is the same.

The third printf() passes more data arguments than the format string needs, but that is benign. It isn't ideal, though. Again, the compiler can check better if the format string is a literal, but the run-time effect is practically the same.

From the printf() specification linked to at the top:

Each of these functions converts, formats, and prints its arguments under control of the format. The format is a character string, beginning and ending in its initial shift state, if any. The format is composed of zero or more directives: ordinary characters, which are simply copied to the output stream, and conversion specifications, each of which shall result in the fetching of zero or more arguments. The results are undefined if there are insufficient arguments for the format. If the format is exhausted while arguments remain, the excess arguments shall be evaluated but are otherwise ignored.

In all these cases, there is no strong indication of why the format string is not a literal. However, one reason for wanting a non-literal format string might be that sometimes you print the floating point numbers in %f notation and sometimes in %e notation, and you need to choose which at run-time. (If it is simply based on value, %g might be appropriate, but there are times when you want the explicit control — always %e or always %f.)

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • 1
    I don't see how the first two paragraphs are explaining this behavior. The format string could come from the user in both cases. – Eugene Sh. Sep 09 '16 at 16:11
  • This is a great answer! So ignoring the warning, as if I really know what I am doing and there will be no errors, is one or another more efficient to use or are they the same? Considering both space and time. – Mikael Sep 09 '16 at 16:11
  • 1
    @MikaelMello the more efficient way is to use `puts`. No format to parse, just simple character output – phuclv Sep 09 '16 at 16:30
  • 2
    @EugeneSh. yep. That's a line in the sand the compiler has to draw. If it always complained about non-literal format strings, it would make e.g. internationalised messages annoying to use. Based on that behaviour, I think it just assumes that without the other arguments, you mixed up `printf` with `puts`, but with the arguments, you seem to know what you are doing. – ilkkachu Sep 09 '16 at 19:43
  • @MikaelMello, since this was about `gcc`, I think it replaces a `printf("some string\n");` with `puts("some string");` (with optimizations enabled etc.) – ilkkachu Sep 09 '16 at 19:45
  • @ilkkachu not only that but it also replaces `printf("%s\n", some_string)` with `puts(some_string)` which might be significant if you think `%s` is defined with null pointers... – Antti Haapala -- Слава Україні Feb 15 '19 at 03:34
6

The warning says it all.

First, to discuss about the issue, as per the signature, the first parameter to printf() is a format string which can contain format specifiers (conversion specifier). In case, a string contains a format specifier and the corresponding argument is not supplied, it invokes undefined behavior.

So, a cleaner (or safer) approach (of printing a string which needs no format specification) would be puts(s); over printf(s); (the former does not process s for any conversion specifiers, removing the reason for the possible UB in the later case). You can choose fputs(), if you're worried about the ending newline that automatically gets added in puts().


That said, regarding the warning option, -Wformat-security from the online gcc manual

At present, this warns about calls to printf and scanf functions where the format string is not a string literal and there are no format arguments, as in printf (foo);. This may be a security hole if the format string came from untrusted input and contains %n.

In your first case, there's only one argument supplied to printf(), which is not a string literal, rather a variable, which can be very well generated/ populated at run time, and if that contains unexpected format specifiers, it may invoke UB. Compiler has no way to check for the presence of any format specifier in that. That is the security problem there.

In the second case, the accompanying argument is supplied, the format specifier is not the only argument passed to printf(), so the first argument need not to be verified. Hence the warning is not there.


Update:

Regarding the third one, with excess argument that required by the supplied format string

printf(s, 99, 50);

quoting from C11, chapter §7.21.6.1

[...] If the format is exhausted while arguments remain, the excess arguments are evaluated (as always) but are otherwise ignored. [...]

So, passing excess argument is not a problem (from the compiler perspective) at all and it is well defined. NO scope for any warning there.

Sourav Ghosh
  • 133,132
  • 16
  • 183
  • 261
  • `printf(s)` and `puts(s)` are not equivalent. First, of course, `printf` processes format specifiers. Second, `puts` adds a newline. – Keith Thompson Sep 09 '16 at 16:16
  • 1
    @KeithThompson I never said, it is a replacement, sir. It's a safer option, in case the string contains format specifiers, they will be ignored by `puts()`. That's the point. – Sourav Ghosh Sep 09 '16 at 16:17
  • @KeithThompson and regarding the _newline_, there's `fputs()`, anyway. :) – Sourav Ghosh Sep 09 '16 at 16:20
  • @SouravGhosh So, if the programmer knows what he is doing, that is, the string is "secure" and the arguments are right, there is no difference right? But I think that if you have this, you would not need to have a string as the first argument at all, so there is no point in ever using it this way? Edit Jonathan Leffer justgave a reason to use it this way. My point still persists though, if I am certain that the string is secure, there is no problem, right? – Mikael Sep 09 '16 at 16:21
  • @MikaelMello See, let me get the basics clear. You can read `printf()` as _print with formatting/formatted printer_. So, in case, you _don't need_ a format specifier, why use `printf()`? Stick to `puts()`/`fputs()`. Yes, when you need to print the formatted output, welcome to `printf()` family. – Sourav Ghosh Sep 09 '16 at 16:23
  • @MikaelMello Just because you __can__, you don't __have to__ use something and make extra assumptions. Stick to the function/ API which is designed for a particular purpose. You're getting my point there? Why to leave the security of a program to __if , but, then__ ? – Sourav Ghosh Sep 09 '16 at 16:24
  • @SouravGhosh, I see your point and agree with it. I was actually just curious regarding the behavior and wanted to make sure that it was secure to use it that way (yes, using format specifiers), it apparently is but I see no point in really using it anyway, thanks for your answers! – Mikael Sep 09 '16 at 16:28
  • 1
    @MikaelMello Just for elaborations sake, I've added a bit more explanation on the third case you have added. Have a look if you want. :) – Sourav Ghosh Sep 09 '16 at 16:32
5

There are two things in play in your question.

The first is covered succinctly by Jonathan Leffler - the warning you're getting is because the string isn't literal and doesn't have any format specifiers in it.

The other is the mystery of why the compiler doesn't issue a warning that your number of arguments doesn't match the number of specifiers. The short answer is "because it doesn't," but more specifically, printf is a variadic function. It takes any number of arguments after the initial format specification - from 0 on up. The compiler can't check to see if you gave the right amount; that's up to the printf function itself, and leads to the undefined behavior that Joachim mentioned in comments.

EDIT: I'm going to give further answer to your question, as a means of getting on a small soapbox.

What's the difference between printf(s) and printf("%s", s)? Simple - in the latter, you're using printf as it's declared. "%s" is a const char *, and it will subsequently not generate the warning message.

In your comments to other answers, you mentioned "Ignoring the warning...". Don't do this. Warnings exist for a reason, and should be resolved (otherwise they're just noise, and you'll miss warnings that actually matter among the cruft of all the ones that don't.)

Your issue can be resolved in several ways.

const char* s = "abcdefghij\n";
printf(s);

will resolve the warning, because you're now using a const pointer, and there are none of the dangers that Jonathan mentioned. (You could also declare it as const char* const s, but don't have to. The first const is important, because it then matches the declaration of printf, and because const char* s means that characters pointed to by s can't change, i.e. the string is a literal.)

Or, even simpler, just do:

printf("abcdefghij\n");

This is implicitly a const pointer, and also not a problem.

Community
  • 1
  • 1
Scott Mermelstein
  • 15,174
  • 4
  • 48
  • 76
  • For questions of `const char*` vs `char * const`, please refer to http://stackoverflow.com/questions/1143262/what-is-the-difference-between-const-int-const-int-const-and-int-const – Scott Mermelstein Sep 09 '16 at 18:41
  • Mmh. "The compiler can't check to see if you gave the right amount [of arguments]" -- But `gcc` __can__: try with `printf("%d\n");` or `printf("%d\n", 123, 456);` (-Wformat and -Wformat-extra-args resp., both are set by -Wall on gcc 4.9.2 (at least on my system)). – ilkkachu Sep 10 '16 at 07:46
  • Also, the constness of the format string doesn't come into it... `const char* s = "abcdefghij\n"; printf(s);` gives (with -Wformat-security): `printf.c:8:2: warning: format not a string literal and no format arguments [-Wformat-security]` -- apparently (as it says) it only checks the contents of the format string if it's a literal. If I had to guess, it's an implementation detail (not having to follow the contents of the string through the program for this.) – ilkkachu Sep 10 '16 at 07:50
3

So what's the underlying difference between printf(s) and printf("%s", s)

"printf(s)" will treat s as a format string. If s contains format specifiers then printf will interpret them and go looking for varargs. Since no varargs actually exist this will likely trigger undefined behaviour.

If an attacker controls "s" then this is likely to be a security hole.

printf("%s",s) will just print what is in the string.

and why do I get a warning in just one case?

Warnings are a balance between catching dangerous stupidity and not creating too much noise.

C programmers are in the habbit of using printf and various printf like functions* as generic print functions even when they don't actually need formatting. In this environment it's easy for someone to make the mistake of writing printf(s) without thinking about where s came from. Since formatting is pretty useless without any data to format printf(s) has little legitimate use.

printf(s,format,arguments) on the other hand indicates that the programmer deliberately intended formatting to take place.

Afaict this warning is not turned on by default in upstream gcc, but some distros are turning it on as part of their efforts to reduce security holes.

* Both standard C functions like sprintf and fprintf and functions in third party libraries.

plugwash
  • 9,724
  • 2
  • 38
  • 51
2

The underlying reason: printf is declared like:

int printf(const char *fmt, ...) __attribute__ ((format(printf, 1, 2)));

This tells gcc that printf is a function with a printf-style interface where the format string comes first. IMHO it must be literal; I don't think there's a way to tell the good compiler that s is actually a pointer to a literal string it had seen before.

Read more about __attribute__ here.

Andreas Spindler
  • 7,568
  • 4
  • 43
  • 34
  • 1
    "*IMHO it must be literal*" -- There is no requirement for the format string to be a literal (though it's generally a good idea). – Keith Thompson Sep 09 '16 at 18:23
  • Of course. But this was a straight answer to the original question: "So what's the underlying difference between `printf(s)` and `printf("%s", s)` and why do I get a warning in just one case?" You'll get a warning because `printf` is declared with `__attribute__((format(...` AND when the first argument is literal. It happens all at compile-time. At runtime `printf` sees only a pointer, whether passed in form of `"%s"` or `s`. – Andreas Spindler Sep 10 '16 at 06:12
  • uh, this is confusingly put in the least. The format string doesn't _have_ to be literal, it's just that the compiler can only check it (that's what the `__attribute__((format))` tells it to do), if it _is_ a literal. – ilkkachu Sep 10 '16 at 07:55
  • Yes. Exactly what a said. – Andreas Spindler Sep 10 '16 at 09:17