2

The following simple program is behaving unpredictably. Sometimes it prints "0.00000", sometimes it prints more "0" than I can count. Some times it uses up all memory on the system, before the system either kills some process, or it fails with bad_alloc.

#include "stdio.h"

int main() {
  fprintf(stdout, "%.*f", 0.0);
}

I'm aware that this is incorrect usage of fprintf. There should be another argument specifying the width of the formatting. It's just surprising that the behavior is so unpredictable. Sometimes it seems to use a default width, while sometimes it fails very badly. Could this not be made to always fail or always use some default behaviour?

I came over similar usage in some code at work, and spent a lot of time figuring out what was happening. It only seemed to happen with debug builds, but would not happen while debugging with gdb. Another curiosity is that running it through valgrind would consistently bring about the printing of many "0"s case, which otherwise happens quite seldom, but the memory usage issue would never occur then either.

I am running Red Hat Enterprise Linux 7, and compiled with gcc 4.8.5.

morten
  • 392
  • 1
  • 11
  • When you intentionally trigger undefined behavior, you should look at the disassembly. – user202729 May 29 '19 at 08:46
  • @user202729 good luck with investigating the inner workings of `fprintf` in the disassembled code. – Jabberwocky May 29 '19 at 08:55
  • 2
    Undefined behaviour is undefined. It could indeed be made to always fail in a predictable way - in a language with strict runtime checks to slow down the correct code of everyone else who didn't write UB in the first place. Some compilers will warn about `*printf` format string mismatches, and you can build with an address sanitizer if you want those runtime checks. – Useless May 29 '19 at 09:16
  • The `%.*f` format causes `printf()` (and related functions) to expect TWO arguments, one the width and the other the floating point value to be formatted. Your code only provides one argument following the format string, so the behaviour is undefined. Trying to explain the reason for any particular undefined behaviour (in this case, caused by `printf()` being told to ASSUME two arguments will be provided, but only one is) is pointless. – Peter May 29 '19 at 09:19
  • Oh, and your compiler is one of those that will warn about format string problems (even though it is really old). Use `-Wformat`. – Useless May 29 '19 at 09:22
  • 1
    _"It's just surprising that the behavior is so unpredictable."_ Please please please get yourself out of [this mindset](https://stackoverflow.com/q/54120862/560648)! – Lightness Races in Orbit May 29 '19 at 10:00

4 Answers4

5

Formally this is undefined behavior.

As for what you're observing in practice:
My guess is that fprintf ends up using an uninitialized integer as the number of decimal places to output. That's because it'll try to read a number from a location where the caller didn't write any particular value, so you'll just get whatever bits happen to be stored there. If that happens to be a huge number, fprintf will try to allocate a lot of memory to store the result string internally. That would explain the "running out of memory" part.

If the uninitialized value isn't quite that big, the allocation will succeed and you'll end up with a lot of zeroes.

And finally, if the random integer value happens to be just 5, you'll get 0.00000.

Valgrind probably consistently initializes the memory your program sees, so the behavior becomes deterministic.

Could this not be made to always fail

I'm pretty sure it won't even compile if you use gcc -pedantic -Wall -Wextra -Werror.

melpomene
  • 84,125
  • 8
  • 85
  • 148
  • The interesting part is why the integer is uninitialized. My understanding is that the bytes passed for `0.0` should be used for that integer. I assume that a `double` is at least as large as an `int` and only the following parameter is missing. That means the value should be random but the length should be fixed. – Gerhardh May 29 '19 at 09:14
  • gcc -pedantic -Wall -Wextra -Werror does indeed result in compilation error. This is probably the most useful bit of information to come out of this question. I shall check if we can enable this in our build. – morten May 29 '19 at 09:14
  • @Gerhardh No, on many platforms integers and doubles use different parameter passing mechanisms. On such a system `fprintf` will first try to get the "next integer" argument (and get garbage), then get the "next double" argument (and get `0.0`). – melpomene May 29 '19 at 09:17
  • @melpomene OK, that makes sense. I have seen that for normal parameter lists but not yet for variable parameters using `...`. I didn't look into asm for a while... ;) – Gerhardh May 29 '19 at 09:21
  • The compiler flags would probably not have helped in our case after all, since the format string was actually a parameter to the function calling fprintf. – morten May 29 '19 at 11:08
  • @morten That sounds like a design flaw. – melpomene May 29 '19 at 11:11
  • @melpomene I agree. I will see if we can enable those flags anyway ;) – morten May 29 '19 at 11:20
2

The format string does not match the parameters, therefore the bahaviour of fprintf is undefined. Google "undefined behaviour C" for more information about "undefined bahaviour".

This would be correct:

// printf 0.0 with 7 decimals
fprintf(stdout, "%.*f", 7, 0.0);

Or maybe you just want this:

// printf 0.0 with de default format
fprintf(stdout, "%f", 0.0);

About this part of your question: Sometimes it seems to use a default width, while sometimes it fails very badly. Could this not be made to always fail or always use some default behaviour?

There cannot be any default behaviour, fprintf is reading the arguments according to the format string. If the arguments don't match, fprintf ends up with seamingly random values.


About this part of your question: Another curiosity is that running it through valgrind would consistently bring about the printing of many "0"s case, which otherwise happens quite seldom, but the memory usage issue would never occur then either.:

This is just another manifestation of undefined behaviour, with valgrind the conditions are quite different and therefore the actual undefined bahaviour can be different.

Jabberwocky
  • 48,281
  • 17
  • 65
  • 115
  • 1
    @user694733 IMHO there _is_ no point in this question. It's useless to reason about undefined behaviour, especially in this case where a complex library function such as `fprintf` is involved. – Jabberwocky May 29 '19 at 08:54
  • FWIW there are "pointless" questions like [this](https://stackoverflow.com/questions/48270127/can-a-1-a-2-a-3-ever-evaluate-to-true). The help center says that only practical questions are allowed however there's no close reason for "impractical" ones. – user202729 May 29 '19 at 08:56
  • I don't think this answer helps with "It's just surprising that the behavior is so unpredictable." The answer should clearly revolve explaining why UB is random - or maybe just mark as a duplicate of one that does. – UKMonkey May 29 '19 at 08:56
2

Undefined behaviour is undefined.

However, on x86-64 System-V ABI it is well-known that arguments are not passed on stack but in registers. Floating point variables are passed in floating-point registers, and integers are passed in general-purpose registers. There is no parameter store on stack, so the width of the arguments does not matter. Since you never passed any integer in the variable argument part, the general purpose register corresponding to the first argument will contain whatever garbage it had from before.

This program will show how the floating point values and integers are passed separately:

#include <stdio.h>

int main() {
    fprintf(stdout, "%.*f\n", 42, 0.0);
    fprintf(stdout, "%.*f\n", 0.0, 42);
}

Compiled on x86-64, GCC + Glibc, both printfs will produce the same output:

0.000000000000000000000000000000000000000000
0.000000000000000000000000000000000000000000
1

This is undefined behaviour in the standard. It means "anything is fair game" because you're doing wrong things.

The worst part is that most certainly any compiler will warn you, but you have ignored the warning. Putting some kind of validation other than the compiler will incurr in a cost that everybody will pay just so you can do what's wrong.

That's the opposite of what C and C++ stand for: you pay for what you use. If you want to pay the cost, it's up to you to do the checking.

What's really happening depends on the ABI, compiler and architecture. It's undefined behaviour because the language gives the implementer the freedom to do what's better on every machine (meaning, sometimes faster code, sometimes shorter code).

As an example, when you call a function on the machine, it just means that you're instructing the microprocessor to go to a certain code location.

In some made up assembly and ABI, then, printf("%.*f", 5, 1); will translate into something like

mov A, STR_F ; // load into register A the 32 bit address of the string "%.*f"
mov B, 5 ; // load second 32 bit parameter into B 
mov F0, 1.0 ; // load first floating point parameter into register F0
call printf ; // call the function

Now, if you miss some parameter, in this case B, it will take any value that was there before.

The thing with functions like printf is that they allow anything in their parameter list (it's printf(const char*, ...), so anything is valid). That's why you shouldn't use printf on C++: you have better alternatives, like streams. printf avoids the checkings of the compiler. streams are better aware of types and are extensible to your own types. Also, that's why your code should compile without warnings.

Mirko
  • 1,043
  • 6
  • 12