8

How do functions such as printf extract digits from a floating point number? I understand how this could be done in principle. Given a number x, of which you want the first n digits, scale x by a power of 10 so that x is between pow(10, n) and pow(10, n-1). Then convert x into an integer, and take the digits of the integer.

I tried this, and it worked. Sort of. My answer was identical to the answer given by printf for the first 16 decimal digits, but tended to differ on the ones after that. How does printf do it?

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
Alecto Irene Perez
  • 10,321
  • 23
  • 46
  • 4
    a `double` (64-bit IEEE-754 floating point number) only has about 15 digits of precision anyway – kmdreko Jun 26 '18 at 23:02
  • @vu1p3n0x: An IEEE-754 floating-point number represents one number exactly. It is, by specification in the IEEE-754 standard, infinitely precise. Various applications sometimes desire to convert such numbers to decimal exactly. – Eric Postpischil Jun 26 '18 at 23:26
  • @EricPostpischil: What you write is technically correct but what vu1pn3n0x is talking about is the precision of the data type, which is ~15-16 digits across most of its range. – Dietrich Epp Jun 26 '18 at 23:54
  • @DietrichEpp: The data type is exact across its entire range. The IEEE-754 standard is clear on this: Every floating-point number represents one number exactly. The **operations** perform rounding. There is no approximation in the numbers. What vu1p3n0x is talking about is a crude notion about using binary floating-point for decimal. But not only is it a poor notion for working with floating-point, it is not relevant to this question. The OP does not ask about anything involving floating-point having been used to attempt decimal arithmetic. They ask about correctly printing floating-point. – Eric Postpischil Jun 27 '18 at 00:04
  • @EricPostpischil: Your usage of the word "precision" is not in agreement with the IEEE 754 standard. *precision: **The maximum number p of significant digits that can be represented in a format,** or the number of digits to that a result is rounded.* Please do not pretend only the second meaning exists just so you can correct someone who uses the first meaning. – Dietrich Epp Jun 27 '18 at 00:11
  • Citation: p.5 IEEE Std 754-2008. – Dietrich Epp Jun 27 '18 at 00:23
  • @DietrichEpp: My comments do not contain the word “precision.” I used the term “infinitely precise,” which IEEE-754 does when referring to an exact mathematical number. In other words, the bits used to represent a number in a floating-point number mean the format has some limited precision, but the number represented is infinitely precise—it is exact. – Eric Postpischil Jun 27 '18 at 11:13
  • @EricPostpischil: The term “infinitely precise” in the standard is only ever used to describe intermediate results before “rounding to a floating-point number”. I would agree that you can describe a number as being exact or inexact, however, I would avoid using the word “infinitely precise” when talking about a floating-point number because this usage is confusing, which explains why the standard carefully avoids this usage. For a similar reason you have a legitimate concern that talking about the “precision” of a `double` could be ambiguous, but I think its meaning is clear. – Dietrich Epp Jun 27 '18 at 16:28
  • @DietrichEpp: Your discussion about terminology is irrelevant to the point. (Notwithstanding the fact that I use “infinitely precise” in the same **way** the standard does even if I apply it to a different object. Adjective phrases are not tied to a single object; they can be applied to other objects.) You may substitute “exact” for “infinitely precise” in my comment, and it remains correct, and vu1p3n0x”s comment remains both a bad conception of how floating-point works and irrelevant to this Stack Overflow question. – Eric Postpischil Jun 27 '18 at 16:38

3 Answers3

6

The classic implementation is David Gay's dtoa. The exact details are somewhat arcane (see Why does "dtoa.c" contain so much code?), but in general it works by doing the base conversion using more precision beyond what you can get from a 32-bit, 64-bit, or even 80-bit floating point number. To do this, it uses so-called "bigints" or arbitrary-precision numbers, which can hold as many digits as you can fit in memory. Gay's code has been copied, with modifications, into countless other libraries including common implementations for the C standard library (so it might power your printf), Java, Python, PHP, JavaScript, etc.

(As a side note... not all of these copies of Gay's dtoa code were kept up to date, so because PHP used an old version of strtod it hung when parsing 2.2250738585072011e-308.)

In general, if you do things the "obvious" and simple way like multiplying by a power of 10 and then converting the integer, you will lose a small amount of precision and some of the results will be inaccurate... but maybe you will get the first 14 or 15 digits correct. Gay's implementation of dtoa() claims to get all the digits correct... but as a result, the code is quite difficult to follow. Skip to the bottom to see strtod itself, you can see that it starts with a "fast path" which just uses ordinary floating-point arithmetic, but then it detects if that result is incorrect and uses a more reliable algorithm using bigints which works in all cases (but is slower).

The implementation has the following citation, which you may find interesting:

 * Inspired by "How to Print Floating-Point Numbers Accurately" by
 * Guy L. Steele, Jr. and Jon L. White [Proc. ACM SIGPLAN '90, pp. 112-126].

The algorithm works by calculating a range of decimal numbers which produce the given binary number, and by using more digits, the range gets smaller and smaller until you either have an exact result or you can correctly round to the requested number of digits.

In particular, from sec 2.2 Algorithm,

The algorithm uses exact rational arithmetic to perform its computations so that there is no loss of accuracy. In order to generate digits, the algorithm scales the number so that it is of the form 0.d1d2..., where d1, d2, ..., are base-B digits. The first digit is computed by multiplying the scaled number by the output base, B, and taking the integer part. The remainder is used to compute the rest of the digits using the same approach.

The algorithm can then continue until it has the exact result (which is always possible, since floating-point numbers are base 2, and 2 is a factor of 10) or until it has as many digits as requested. The paper goes on to prove the algorithm's correctness.


Also note that not all implementations of printf are based on Gay's dtoa, this is just a particularly common implementation that's been copied a lot.

Dietrich Epp
  • 205,541
  • 37
  • 345
  • 415
2

There are various ways to convert floating-point numbers to decimal numerals without error (either exactly or with rounding to a desired precision).

One method is to use arithmetic as taught in elementary school. C provides functions to work with floating-point numbers, such as frexp, which separates the fraction (also called the significand, often mistakenly called a mantissa) and the exponent. Given a floating-point number, you could create a large array to store decimal digits in and then compute the digits. Each bit in the fraction part of a floating-point number represents some power of two, as determined by the exponent in the floating-point number. So you can simply put a “1” in an array of digits and then use elementary school arithmetic to multiply or divide it the required number of times. You can do that for each bit and then add all the results, and the sum is the decimal numeral that equals the floating-point number.

Commercial printf implementations will use more sophisticated algorithms. Discussing them is beyond the scope of a Stack Overflow question-and-answer. The seminal paper on this is Correctly Rounded Binary-Decimal and Decimal-Binary Conversions by David M. Gay. (A copy appears to be available here, but that seems to be hosted by a third party; I am not sure how official or durable it is. A web search may turn up other sources.) A more recent paper with an algorithm for converting a binary floating-point number to decimal with the shortest number of digits needed to uniquely distinguish the value is Printing Floating-Point Numbers: An Always Correct Method by Marc Andrysco, Ranjit Jhala, and Sorin Lerner.

One key to how it is done is that printf will not just use the floating-point format and its operations to do the work. It will use some form of extended-precision arithmetic, either by working with parts of the floating-point number in an integer format with more bits, by separating the floating-point number into pieces and using multiple floating-point numbers to work with it, or by using a floating-point format with more precision.

Note that the first step in your question, multiple x by a power of ten, already has two rounding errors. First, not all powers of ten are exactly representable in binary floating-point, so just producing such a power of ten necessarily has some representation error. Then, multiplying x by another number often produces a mathematical result that is not exactly representable, so it must be rounded to the floating-point format.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
1

Neither the C or C++ standard does not dictate a certain algorithm for such things. Therefore is impossible to answer how printf does this.

If you want to know an example of a printf implementation, you can have a look here: http://sourceware.org/git/?p=glibc.git;a=blob;f=stdio-common/vfprintf.c and here: http://sourceware.org/git/?p=glibc.git;a=blob;f=stdio-common/printf_fp.c

klutt
  • 30,332
  • 17
  • 55
  • 95
  • The mathematical properties of a floating-point number can be used to convert it to decimal without any knowledge of its representation, so one algorithm could be used for general floating-point formats. – Eric Postpischil Jun 26 '18 at 23:52
  • Also, there are very few platforms which don't use IEEE754 nowadays. – Simon Byrne Jun 26 '18 at 23:53