3

I am reading C Primer Plus by Stephen Prata, and one of the first ways it introduces floats is talking about how they are accurate to a certain point. It says specifically "The C standard provides that a float has to be able to represent at least six significant figures...A float has to represent accurately the first six numbers, for example, 33.333333"

This is odd to me, because it makes it sound like a float is accurate up to six digits, but that is not true. 1.4 is stored as 1.39999... and so on. You still have errors.

So what exactly is being provided? Is there a cutoff for how accurate a number is supposed to be?

In C, you can't store more than six significant figures in a float without getting a compiler warning, but why? If you were to do more than six figures it seems to go just as accurately.

This is made even more confusing by the section on underflow and subnormal numbers. When you have a number that is the smallest a float can be, and divide it by 10, the errors you get don't seem to be subnormal? They seem to just be the regular rounding errors mentioned above.

So why is the book saying floats are accurate to six digits and how is subnormal different from regular rounding errors?

Akimbo
  • 63
  • 4
  • 1
    Where in the C standard does it say that? https://port70.net/~nsz/c/c11/n1570.html – Govind Parmar Jan 31 '19 at 22:44
  • 3
    The example has 8 significant figures – M.M Jan 31 '19 at 22:46
  • I didn't write the 33.333333. It's straight out of the book. it implied that 33.3333 would be saved and the rest would be truncated. – Akimbo Jan 31 '19 at 22:52
  • 3
    @GovindParmar: C 2018 5.2.4.2.2 12 says that `FLT_DIG` must be at least 6, and it is the number of decimal digits, *q*, such that any floating-point number with *q* decimal digits (for example “1.40000e0” in input) can be rounded into a floating-point number with *p* radix *b* digits (by which it refers to one of the internal formats, such as `float`, `double`, or `long double`) and back again without change to the *q* decimal digits. – Eric Postpischil Jan 31 '19 at 22:53
  • So does that mean if you use more significant digits than a float can hold, when you do arithmetic, it can have errors even beyond the approximation it started with, or how does this work? – Akimbo Jan 31 '19 at 23:00
  • @M.M: Working on it. – Eric Postpischil Jan 31 '19 at 23:09
  • The required conversion from base10 to base2 inevitably causes loss of precision. The *float* type can represent 2^24 distinct values, 23 bits stored and one implied. That's 16,777,216 distinct values or log10(16777216) = 7.22 digits of precision. But the conversion is applied *twice* when you print the value, first from base10 to base2 and then back again to base10. Every conversion loses 0.5 bit of precision due to rounding. So 24 - 2*0.5 = 23 bits in practice, log10(2^23) = 6.92 digits. Close but no cigar, that's 6. – Hans Passant Jan 31 '19 at 23:51
  • The question says “In C, you can't store more than six significant figures in a float without getting a compiler warning,” but that is not my experience. Some compilers might warn you that a number in source text is not exactly representable, but I do not recall seeing a warning just because a numeral in source text has too many digits, especially if it represents a number that is representable in the target type. – Eric Postpischil Feb 01 '19 at 00:09
  • 1
    @Akimbo: you're asking an important question about a very big subject. Extremely knowledgeable folks (especially Eric Postpischil) have given you some very detailed answers. Q: Is it *helping* you? If not, please read [this](https://floating-point-gui.de/), [this](http://blog.reverberate.org/2014/09/what-every-computer-programmer-should.html) and/or [this](http://www.cs.yale.edu/homes/aspnes/pinewiki/C(2f)FloatingPoint.html). Please post back any specific questions. – paulsm4 Feb 01 '19 at 04:03
  • I have the same questions I've been responding to Eric PostPischil's about. I have been able to find numbers with six significant figures that lose their value when changed to decimal and I can find numbers that have more than that and don't lose their value when changed to decimal. I can only assume this is a massive misunderstanding on my part, but this C standard for decimal conversion seems to be arbitrary and non-functional – Akimbo Feb 01 '19 at 06:56
  • @Akimbo "I have been able to find numbers with six significant figures that lose their value when changed to decimal" --> provide your counter example. All `f` [33.33325f... 33.3333499999...f]` will print "33.3333" with `printf("%.4f\n", f);`. – chux - Reinstate Monica Feb 02 '19 at 21:41

2 Answers2

6

Suppose you have a decimal numeral with q significant digits:

dq−1.dq−2dq−3d0,

and let’s also make it a floating-point decimal numeral, meaning we scale it by a power of ten:

dq−1.dq−2dq−3d0•10e.

Next, we convert this number to float. Many such numbers cannot be exactly represented in float, so we round the result to the nearest representable value. (If there is a tie, we round to make the low digit even.) The result (if we did not overflow or underflow) is some floating-point number x. By the definition of floating-point numbers (in C 2018 5.2.4.2.2 3), it is represented by some number of digits in some base scaled by that base to a power. Supposing it is base two, x is:

bp−1.bp−2bp−3b0•2p.

Next, we convert this float x back to decimal with q significant digits. Similarly, the float value x might not be exactly representable as a decimal numeral with q digits, so we get some possibly new number:

nq−1.nq−2nq−3n0•10m.

It turns out that, for any float format, there is some number q such that, if the decimal numeral we started with is limited to q digits, then the result of this round-trip conversion will equal the original number. Each decimal numeral of q digits, when rounded to float and then back to q decimal digits, results in the starting number.

In the 2018 C standard, clause 5.2.4.2.2, paragraph 12, tells us this number q must be at least 6 (a C implementation may support larger values), and the C implementation should define a preprocessor symbol for it (in float.h) called FLT_DIG.

So considering your example number, 1.4, when we convert it to float in the IEEE-754 basic 32-bit binary format, we get exactly 1.39999997615814208984375 (that is its mathematical value, shown in decimal for convenience; the actual bits in the object represented it in binary). When we convert that to decimal with full precision, we get “1.39999997615814208984375”. But if we convert it to decimal with rounding six digits, we get “1.40000”. So 1.4 survives the round trip.

In other words, it is not true in general that six decimal digits can be represented in float without change, but it is true that float carries enough information that you can recover six decimal digits from it.

Of course, once you start doing arithmetic, errors will generally compound, and you can no longer rely on six decimal digits.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
  • This will sound beginnerish because it is, but what do we exactly mean when we say we are "converting it to decimal"? When I write a float literal in an IDE, it's a float literal, and I store it in a float. 1.4 will always be 1.39999997615814208984375. So when does the rounding to decimal happen. In just a c compiler? Or will all computers and all programs round 1.39999997615814208984375 to 1.4 decimal when asked? Since an integer can't be 1.4, and floats are used to represent them, I thought a decimal form was more of an idea than an actual thing we convert to (at least, to a computer). – Akimbo Jan 31 '19 at 23:19
  • 1
    @Akimbo: In general, conversion is an operation (or function) whose input is one type and whose output is another type and for which the output value is as close to the input value as possible. For example, converting a pointer to `int` to a pointer to `char` produces a pointer to the same place in memory but with a different type. Conversion of three in a `float` to an `int` produces three in an `int`. It is just a change of representation with as little change in value as possible. – Eric Postpischil Jan 31 '19 at 23:29
  • 1
    @Akimbo: When `1.4f` appears in the source text of a program, it is converted to `float` during translation (compilation). The C implementation (usually the compiler at this point) rounds it (most often using round-to-nearest-ties-to-even, but other rules are possible). If you write `float x = 1.4;`, then `1.4` is converted to `double`, because `1.4` without the `f` is interpreted as a `double` constant, then, because it is being used to initialize a `float`, it is converted to `float`. When you print it with `printf` and some format like `%f` or `%g`, it is converted to decimal. – Eric Postpischil Jan 31 '19 at 23:31
  • So in terms of C, since decimal isn't really a data type, more a way to represent output, does that mean this is only relevant for IO functions? If everything going on behind the scenes is float arithmetic and it only really gets converted to decimal for functions like printf, is this book section a long winded way of saying "functions like printf that convert to decimal are rounded accurately to six sig digits?" Edit: I do understand casting and float literals, I'm just confused where the actual implications of this six digit thing crops up other than printf – Akimbo Feb 01 '19 at 01:01
  • @Akimbo: What it is telling you is sort of a measure of how much information there is in a `float`. It means that if you convert a decimal numeral—by any means—to a `float` and then convert it back—by any means—then you will get the original number back, provided it had at most *q* digits and you rounded the result to *q* digits. (The conversions must have been done with correct rounding; some software is sloppy about that.) The conversions could have been done by `scanf` and `printf` or by compiling from source text or by your own software. It is just saying `float` is enough for *q* digits. – Eric Postpischil Feb 01 '19 at 01:08
  • In that case, shouldn't printing 1.4 with printf("%.10f", x), where x is a float assigned 1.4, print 1.40000(etc.). Is that not a float assigned two significant figures being converted to decimal for output? I understand it's actually 1.39 but if decimal is for printf or source code why doesn't this work – Akimbo Feb 01 '19 at 01:41
  • @Akimbo: Huh? If `FLT_DIG` is 6, meaning the *q* described above is six, that means if you convert the six-digit “1.40000” from decimal to `float` and then to a **six-digit** decimal numeral, you will get “1.40000”. The specification does not say anything about what the result will be when you convert to an **eleven-digit** (`%.10f` is the digits before the decimal point plus ten more) decimal numeral. There is nothing that says that would produce “1.4000000000”. There is no assignment of “two significant figures”. No knowledge of a number of “decimal figures” is retained in a `float`. – Eric Postpischil Feb 01 '19 at 01:52
  • I'm just looking for consistency because then I can find examples where a 7 digit literal is used, stored in a float, then called back with the same accuracy that a 5 digit number. If 1.437162f is stored in a float (and it sounds like I get a compiler warning if I don't cast it to float, but you don't get that same warning) then I print it with just a %e specifier, I get the original value. It just seems like everything works the same regardless and printf rounds a float with the same accuracy whether its over or under 6 digits. – Akimbo Feb 01 '19 at 02:24
  • I think the reason I'm getting confused is there are a lot of examples of floats assigned to literals longer tan six digits that still have enough info to be rounded correctly, like 5.466354. But this may be more luck than the standard? – Akimbo Feb 01 '19 at 04:28
  • @Akimbo: The rule is that **if** a decimal numeral has at most *q* digits, **then** rounding to `float` and back to *q* decimal digits produces the same number. It does not say that if a number has more digits, the round-trip will not work. `FLT_DIG` tells us that `float` has enough resolution to pin down **every** six-digit decimal number. But seven-digit numbers are too close together; the `float` values are spread too far apart to ensure there is at least one for every seven-digit number. So some seven-digit numbers will be near `float` values and will round as desired, and some will not. – Eric Postpischil Feb 01 '19 at 13:05
  • So essentially I couldn't POSSIBLY find a number that is six significant digits and, when converted to float and rounded back to six digits, could be rounded wrong, whereas I could for seven digits but it would be very hard since the values are still fairly close to each other with gaps? (As I'm writing this I'm just making sure I understand by decreasing a six digit value in a float calculator seeing if anything is rounded wrong). – Akimbo Feb 01 '19 at 19:01
  • @Akimbo: Pretty much. It is not very hard to find seven-digit numbers that fail, especially if you know where to look. – Eric Postpischil Feb 01 '19 at 19:11
3

Thanks to Govind Parmar for citing an on-line example of C11 (or, for that matter C99).

The "6" you're referring to is "FLT_DECIMAL_DIG".

http://c0x.coding-guidelines.com/5.2.4.2.2.html

number of decimal digits, n, such that any floating-point number with p radix b digits can be rounded to a floating-point number with n decimal digits and back again without change to the value,

  { p log10 b        if b is a power of 10
  {
  { [^1 + p log10 b^] otherwise

FLT_DECIMAL_DIG 6
DBL_DECIMAL_DIG 10 LDBL_DECIMAL_DIG 10

"Subnormal" means:

What is a subnormal floating point number?

A number is subnormal when the exponent bits are zero and the mantissa is non-zero. They're numbers between zero and the smallest normal number. They don't have an implicit leading 1 in the mantissa.


STRONG SUGGESTION:

If you're unfamiliar with "floating point arithmetic" (or, frankly, even if you are), this is an excellent article to read (or review):

What Every Programmer Should Know About Floating-Point Arithmetic

paulsm4
  • 114,292
  • 17
  • 138
  • 190
  • What does "and back again" mean in the quote? How do you un-round a number? – M.M Jan 31 '19 at 22:52
  • 1
    This is the wrong direction. `FLT_DECIMAL_DIG` is for rounding from a floating-point object in a C program to a decimal numeral and then back to the original floating-point type. The question asks about preserving decimal digits, meaning you go from a decimal numeral to a C object and then back to a decimal numeral. This is covered by `FLT_DIG` in the next item in the standard. – Eric Postpischil Jan 31 '19 at 22:55
  • I think the rounding here refers to the loss of digits through conversion to binary. _i.e._ you take `p` digits from the string representation, convert it to float, then convert back to a string representation. – paddy Jan 31 '19 at 22:56
  • Could you clarify this for the layman? This is the beginning stages of my book and I have been playing around in the IEEE calculator a lot to get a grip on it, but I still don't understand. What happens if you are exceeding the digits outlined in FLT_DECIMAL_DIG or FLT_DIG? – Akimbo Jan 31 '19 at 22:56
  • I see, there's both `FLT_DIG` and `FLT_DECIMAL_DIG` which are different things – M.M Jan 31 '19 at 23:00
  • 1
    I don't know how much you know or don't know, so that's a tough question to answer. A couple of hints: 1) Steven Prata's book is not "wrong", 2) C uses IEEE-754, and IEEE-754 is rigorously defined. Including "exceptions". 3) Here is an excellent link to "start" with: [What Every Computer Scientist Should Know About Floating-Point Arithmetic](https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html) – paulsm4 Feb 01 '19 at 00:08
  • 2
    @paulsm4: The C standard does not say that implementations use IEEE 754 (or its equivalent, IEC 60559). The C standard offers Annex F, which specifies use of IEC 60559, as an **option** that C implementations may adopt. I do not know of one that has adopted it. Many C implementations use IEEE-754 formats but fail to conform to it in various ways. – Eric Postpischil Feb 01 '19 at 00:28