29

The %g specifier doesn't seem to behave in the way that most sources document it as behaving.

According to most sources I've found, across multiple languages that use printf specifiers, the %g specifier is supposed to be equivalent to either %f or %e - whichever would produce shorter output for the provided value. For instance, at the time of writing this question, cplusplus.com says that the g specifier means:

Use the shortest representation: %e or %f

And the PHP manual says it means:

g - shorter of %e and %f.

And here's a Stack Overflow answer that claims that

%g uses the shortest representation.

And a Quora answer that claims that:

%g prints the number in the shortest of these two representations

But this behaviour isn't what I see in reality. If I compile and run this program (as C or C++ - it's a valid program with the same behaviour in both):

#include <stdio.h>

int main(void) {
    double x = 123456.0;
    printf("%e\n", x);
    printf("%f\n", x);
    printf("%g\n", x);
    printf("\n");

    double y = 1234567.0;
    printf("%e\n", y);
    printf("%f\n", y);
    printf("%g\n", y);
    return 0;
}

... then I see this output:

1.234560e+05
123456.000000
123456

1.234567e+06
1234567.000000
1.23457e+06

Clearly, the %g output doesn't quite match either the %e or %f output for either x or y above. What's more, it doesn't look like %g is minimising the output length either; y could've been formatted more succinctly if, like x, it had not been printed in scientific notation.

Are all of the sources I've quoted above lying to me?

I see identical or similar behaviour in other languages that support these format specifiers, perhaps because under the hood they call out to the printf family of C functions. For instance, I see this output in Python:

>>> print('%g' % 123456.0)
123456
>>> print('%g' % 1234567.0)
1.23457e+06

In PHP:

php > printf('%g', 123456.0);
123456
php > printf('%g', 1234567.0);
1.23457e+6

In Ruby:

irb(main):024:0* printf("%g\n", 123456.0)
123456
=> nil
irb(main):025:0> printf("%g\n", 1234567.0)
1.23457e+06
=> nil

What's the logic that governs this output?

Mark Amery
  • 143,130
  • 81
  • 406
  • 459
  • The description I always remember is not "shorter", but "use `%e` or `%f`, whichever provides maximum precision using the least amount of space". – Steve Summit Feb 03 '23 at 20:04

2 Answers2

35

This is the full description of the g/G specifier in the C11 standard:

A double argument representing a floating-point number is converted in style f or e (or in style F or E in the case of a G conversion specifier), depending on the value converted and the precision. Let P equal the precision if nonzero, 6 if the precision is omitted, or 1 if the precision is zero. Then, if a conversion with style E would have an exponent of X:

     if P > X ≥ −4, the conversion is with style f (or F) and precision P − (X + 1).
     otherwise, the conversion is with style e (or E) and precision P − 1.

Finally, unless the # flag is used, any trailing zeros are removed from the fractional portion of the result and the decimal-point character is removed if there is no fractional portion remaining.

A double argument representing an infinity or NaN is converted in the style of an f or F conversion specifier.

This behaviour is somewhat similar to simply using the shortest representation out of %f and %e, but not equivalent. There are two important differences:

  • Trailing zeros (and, potentially, the decimal point) get stripped when using %g, which can cause the output of a %g specifier to not exactly match what either %f or %e would've produced.
  • The decision about whether to use %f-style or %e-style formatting is made based purely upon the size of the exponent that would be needed in %e-style notation, and does not directly depend on which representation would be shorter. There are several scenarios in which this rule results in %g selecting the longer representation, like the one shown in the question where %g uses scientific notation even though this makes the output 4 characters longer than it needs to be.

In case the C standard's wording is hard to parse, the Python documentation provides another description of the same behaviour:

General format. For a given precision p >= 1, this rounds the number to p significant digits and then formats the result in either fixed-point format or in scientific notation, depending on its magnitude.

The precise rules are as follows: suppose that the result formatted with presentation type 'e' and precision p-1 would have exponent exp. Then if -4 <= exp < p, the number is formatted with presentation type 'f' and precision p-1-exp. Otherwise, the number is formatted with presentation type 'e' and precision p-1. In both cases insignificant trailing zeros are removed from the significand, and the decimal point is also removed if there are no remaining digits following it.

Positive and negative infinity, positive and negative zero, and nans, are formatted as inf, -inf, 0, -0 and nan respectively, regardless of the precision.

A precision of 0 is treated as equivalent to a precision of 1. The default precision is 6.

The many sources on the internet that claim that %g just picks the shortest out of %e and %f are simply wrong.

Mark Amery
  • 143,130
  • 81
  • 406
  • 459
  • 5
    Calling them "sources" might give too much credibility to those web pages. – rici Jan 12 '19 at 19:05
  • I'm confused. This spec seems incomplete. 100.611 is X=2, and is therefore printed with precision %f of 3, which is 100. But it prints out 100.611... ah, precision in %f must be .d not sigfigs! – rrauenza Jan 19 '22 at 17:33
  • so I tried printing 10000.30 with P being omitted, and the output should be in f, so traling 0 should be omitted, but there is 3 after decimal, why the output don't have .3 afther 10000. I am getting an output of 10000 – Nikhil Jan 26 '22 at 16:12
-3

My favorite format for doubles is "%.15g". It seems to do the right thing in every case. I'm pretty sure 15 is the maximum reliable decimal precision in a double as well.

  • 7
    -1; this doesn't answer the question. – Mark Amery Jan 13 '19 at 21:37
  • 7
    Other people have answered the general question, so I saw no need to repeat it. I was just offering a helpful suggestion. – Patrick Chkoreff Jan 14 '19 at 18:56
  • 2
    A 64-bit floating point value can **accurately** represent decimal values with [a length of up to 767 places](https://stackoverflow.com/a/62542806/1889329). `"%.15g"` does the right thing if you need a decimal representation *without loss of information*. It doesn't do the right thing if your requirement is to get the decimal representation *with full precision*. – IInspectable May 11 '21 at 09:06
  • It's interesting that your example 0.1000000000000000055511151231257827021181583404541015625 is showing 55 decimal digits. That is far more information than can fit in a 64 bit double. Each decimal digit requires about 3.32 bits to specify, so 55 of them would require 183 bits, not to mention room for sign and exponent. So I'm not sure exactly what your example is displaying there. I suspect all those extra digits have no significance whatsoever. What do you think? – Patrick Chkoreff May 12 '21 at 13:40
  • @pat I'm not sure I understand what you're trying to say. The value you copied is the exact decimal representation of a `double` value that's closest to `0.1` (since `0.1` has no finite representation in IEEE floats). That clearly does fit inside 64 bits of information. As for the significance: All those digits *are* significant. Everything to the left and everything to the right of that decimal is all zeros. You are probably just doing the math wrong. – IInspectable Aug 04 '22 at 14:17
  • 1
    @PatrickChkoreff There are at least three numbers of interest. (1) If you want to know how many decimal digits you can accurately capture as a C `double`, that's `DBL_DIG` from ``, typically 15. (2) If you want to know how many decimal digits you need to unambiguously capture a C `double`, that's a few more, it's `DBL_DECIMAL_DIG`, typically 19. (3) But if you want to *perfectly* capture a base-2 fraction in decimal, that ends up taking exactly as many digits as there are bits in the fraction. That is, a `double` with 53 bits of significance might take 53 significant digits. – Steve Summit Feb 03 '23 at 19:27
  • No, you can't represent every 53-digit fraction unambiguously as a 64-bit double. But a value like `0x0.1999999999999a` (the hexadecimal fraction closest to 0.1 decimal) requires 55 digits to represent perfectly accurately in decimal: it's that number 0.1000000000000000055511151231257827021181583404541015625 . – Steve Summit Feb 03 '23 at 19:31
  • 1
    Every time you add a bit to a binary fraction, you end up adding a digit to its decimal equivalent: `0b0.1` is 0.5, `0b0.11` is 0.75, `0b0.111` is 0.875, `0b0.1111` is 0.9375, `0b0.11111` is 0.96875, ... – Steve Summit Feb 03 '23 at 19:35
  • I wrote: "Each decimal digit requires about 3.32 bits to specify" iinspectable wrote: "You are probably just doing the math wrong." I just got out the calculator and computed the base 2 logarithm of 10. That's ln(10)/ln(2), which is approximately 3.3219, as I said. I"m just talking basic information theory here. – Patrick Chkoreff Feb 05 '23 at 22:51