3

I'm struggling with issues re. floating point accuracy, and could not find a solution.

Here is a short example:

aa<-c(99.93029, 0.0697122)
aa
[1] 99.9302900  0.0697122
aa[1]
99.93029
print(aa[1],digits=20)
99.930289999999999

It would appear that, upon storing the vector, R converted the numbers to something with a slightly different internal representation (yes, I have read circle 1 of the "R inferno" and similar material).

How can I force R to store the input values exactly "as is", with no modification?

In my case, my problem is that the values are processed in such a way that the small errors very quickly grow:

aa[2]/(100-aa[1])*100
[1] 100.0032 ## Should be 100, of course !
print(aa[2]/(100-aa[1])*100,digits=20)
[1] 100.00315593171625

So I need to find a way to get my normalization right.

Thanks

PS- There are many questions on this site and elsewhere, discussing the issue of apparent loss of precision, i.e. numbers displayed incorrectly (but stored right). Here, for instance: How to stop read.table from rounding numbers with different degrees of precision in R? This is a distinct issue, as the number is stored incorrectly (but displayed right).

(R version 3.2.1 (2015-06-18), win 7 x64)

tonytonov
  • 25,060
  • 16
  • 82
  • 98
jfmoyen
  • 495
  • 2
  • 11
  • I believe your question is not about reading tables, but about the numerical precision of R. https://stackoverflow.com/questions/24847918/extreme-numerical-values-in-floating-point-precision-in-r – amatsuo_net Jun 22 '17 at 08:16

3 Answers3

3

Floating point precision has always generated lots of confusion. The crucial idea to remember is: when you work with doubles, there is no way to store each real number "as is", or "exactly right" -- the best you can store is the closest available approximation. So when you type (in R or any other modern language) something like x = 99.93029, you'll get this number represented by 99.930289999999999.

Now when you expect a + b to be "exactly 100", you're being inaccurate in terms. The best you can get is "100 up to N digits after the decimal point" and hope that N is big enough. In your case it would be correct to say 99.9302900 + 0.0697122 is 100 with 5 decimal points of accuracy. Naturally, by multiplying that equality by 10^k you'll lose additional k digits of accuracy.

So, there are two solutions here:

a. To get more precision in the output, provide more precision in the input.

bb <- c(99.93029, 0.06971) 
print(bb[2]/(100-bb[1])*100, digits = 20)
[1] 99.999999999999119

b. If double precision not enough (can happen in complex algorithms), use packages that provide extra numeric precision operations. For instance, package gmp.

tonytonov
  • 25,060
  • 16
  • 82
  • 98
  • Thks for the clarification, it helps a lot to set the problem in the right terms ! However I do not understand the difference between your example using bb and mine with aa ? How are you supplying "more precision in the input" in this case ? – jfmoyen Jun 22 '17 at 10:13
  • @jfmoyen `sum(aa)` is 100.0000022, while `sum(bb)` is 100 up to 10^-15. Another way is `cc <- c(99.9302878, 0.0697122)`. – tonytonov Jun 22 '17 at 13:10
0

i guess you have misunderstood here. It's the same case where R is storing the correct value but the value is displayed accordingly to the value of option chosen while displaying it. For Eg:

# the output of below will be:
> print(99.930289999999999,digits=20)
[1] 99.930289999999999395

But

# the output of:
> print(1,digits=20)
[1] 1

Also

> print(1.1,digits=20)
[1] 1.1000000000000000888
ihm017
  • 182
  • 9
  • Hmmmmmmm yeees... I see what you mean the last example is similar to mine. But this implies that 1 is stored as exactly 1, and 1.1 is stored as 1.1 + 1e-16, isn't it ? I suppose this is not predictable/not under user's control. I may get some numbers "right", and some "wrong". I can understand that, but then how would you suggest I cure the problem (and, specifically, get `a2/(100-a1) = 1`, or indeed close enough to 1) ? – jfmoyen Jun 22 '17 at 09:00
  • @jfmoyen, will `signif` function to cure problem of rounding `round(aa[2]/(100-aa[1]), digits = 1)` to 1 work in your case ? – parth Jun 22 '17 at 10:02
  • @jfmoyen .. you do not have to include digit option while printing. just use `> print(a2/(100-a1) = 1)` `[1] 100` – ihm017 Jun 22 '17 at 10:21
  • @imh017, I realize that but the problem arises from the fact that my value "not close enough to 100" (as you explain neatly in the other reply) is reused downstream and the errors add up very quickly. It's not a problem of displaying the results. – jfmoyen Jun 22 '17 at 11:05
  • @Parth, I guess I'll end up juggling with round/signif to keep things in check, indeed. – jfmoyen Jun 22 '17 at 11:06
0

In addition to previous answers, I think that a good lecture regarding the subject would be

R Inferno, by P.Burns

http://www.burns-stat.com/documents/books/the-r-inferno/

Filippo Mazza
  • 4,339
  • 4
  • 22
  • 25
  • 1
    I'm aware of that, I mentioned it in my original question :-) Sadly the "Inferno" does not suggest solutions... – jfmoyen Jun 22 '17 at 11:59
  • Ooops :D No actually there is no (simple) solution in general. But at least it suggests that it is a known issue :) – Filippo Mazza Jun 22 '17 at 16:37