37

I often end up in situations where it is necessary to check if the obtained difference is above machine precision. Seems like for this purpose R has a handy variable: .Machine$double.eps. However when I turn to R source code for guidelines about using this value I see multiple different patterns.

Examples

Here are a few examples from stats library:

t.test.R

if(stderr < 10 *.Machine$double.eps * abs(mx))

chisq.test.R

if(abs(sum(p)-1) > sqrt(.Machine$double.eps))

integrate.R

rel.tol < max(50*.Machine$double.eps, 0.5e-28)

lm.influence.R

e[abs(e) < 100 * .Machine$double.eps * median(abs(e))] <- 0

princomp.R

if (any(ev[neg] < - 9 * .Machine$double.eps * ev[1L]))

etc.

Questions

  1. How can one understand the reasoning behind all those different 10 *, 100 *, 50 * and sqrt() modifiers?
  2. Are there guidelines about using .Machine$double.eps for adjusting differences due to precision issues?
Roland
  • 127,288
  • 10
  • 191
  • 288
Karolis Koncevičius
  • 9,417
  • 9
  • 56
  • 89
  • 3
    Related: [Floating point less-than-equal comparisons after addition and substraction](https://stackoverflow.com/questions/46739569/floating-point-less-than-equal-comparisons-after-addition-and-substraction) – Henrik Dec 12 '19 at 08:15
  • 3
    [Rule of thumb to test the equality of two doubles in C#?](https://stackoverflow.com/questions/3103782/rule-of-thumb-to-test-the-equality-of-two-doubles-in-c) – Henrik Dec 12 '19 at 09:14
  • 6
    Thus, both posts conclude that "the reasonable degree of certainty" depends on your application. As a case study, you may check [this post on R-devel](https://stat.ethz.ch/pipermail/r-devel/2005-April/032732.html); "Aha! 100 times machine precision in not all that much when the numbers themselves are in double digits." (Peter Dalgaard, member of the R Core team) – Henrik Dec 13 '19 at 09:53
  • @Henrik thank you! These are all very relevant. However the general trend I am picking up is that there are no guides about using this value and there cannot. And also one should set a number much higher than the `double.eps` itself. Maybe that's why `sqrt()` is used so much. – Karolis Koncevičius Dec 14 '19 at 21:07
  • 1
    @KarolisKoncevičius, I don't think it is that simple. It has to do with the general errors present in floating point math and how many operations you execute on them. If you are simply comparing to floating point numbers, use `double.eps`. If you are performing several operations on a floating point number, then your error tolerance should also adjust. This is why [all.equal](https://stat.ethz.ch/R-manual/R-devel/library/base/html/all.equal.html) gives you a `tolerance` argument. – Joseph Wood Dec 16 '19 at 22:56
  • Thanks @JosephWood . The comment that tolerance should be dependant on the number of used operations rings true. Maybe there are guidelines of this sort? i.e. if you multiply two floating point numbers your tolerance should be set to `x`, if you add or subtract - to `y`, etc? – Karolis Koncevičius Dec 17 '19 at 19:48
  • Another (hopefully helpful) post: [Comparing Floating Point Numbers, 2012 Edition](https://randomascii.wordpress.com/2012/02/25/comparing-floating-point-numbers-2012-edition/). E.g. [a comment reminding of your question](https://randomascii.wordpress.com/2012/02/25/comparing-floating-point-numbers-2012-edition/#comment-379): "I’m finding it difficult to decide how many ULPs to allow between floating point numbers in different situations. There isn’t a general rule suggested, so I was wondering if you had a practical relationship to use?" – Henrik Dec 18 '19 at 23:21
  • Perhaps you can get some ideas here as well: [What is the most effective way for float and double comparison?](https://stackoverflow.com/questions/17333/what-is-the-most-effective-way-for-float-and-double-comparison). If you want a more R-specific answer on the "reasoning behind", you may try to (re-)post your question on the R-help mailing list, where I believe it is more likely to be read by R developers than here on SO. Cheers – Henrik Dec 18 '19 at 23:31
  • 1
    Have also a look at [Implementation of nextafter functionality in R](https://stackoverflow.com/q/22047238/10488504) what will give you the next larger double number. – GKi Jan 21 '20 at 16:23

2 Answers2

6

Definition of a machine.eps: it is the lowest value eps for which 1+eps is not 1

As a rule of thumb (assuming a floating point representation with base 2):
This eps makes the difference for the range 1 .. 2,
for the range 2 .. 4 the precision is 2*eps
and so on.

Unfortunately, there is no good rule of thumb here. It's entirely determined by the needs of your program.

In R we have all.equal as a built in way to test approximate equality. So you could use maybe something like (x<y) | all.equal(x,y)

i <- 0.1
 i <- i + 0.05
 i
if(isTRUE(all.equal(i, .15))) { #code was getting sloppy &went to multiple lines
    cat("i equals 0.15\n") 
} else {
    cat("i does not equal 0.15\n")
}
#i equals 0.15

Google mock has a number of floating point matchers for double precision comparisons, including DoubleEq and DoubleNear. You can use them in an array matcher like this:

ASSERT_THAT(vec, ElementsAre(DoubleEq(0.1), DoubleEq(0.2)));

Update:

Numerical Recipes provide a derivation to demonstrate that using a one-sided difference quotient, sqrt is a good choice of step-size for finite difference approximations of derivatives.

The Wikipedia article site Numerical Recipes, 3rd edition, Section 5.7, which is pages 229-230 (a limited number of page views is available at http://www.nrbook.com/empanel/).

all.equal(target, current,
           tolerance = .Machine$double.eps ^ 0.5, scale = NULL,
           ..., check.attributes = TRUE)

These IEEE floating point arithmetic is a well known limitation of computer arithmetic and is discussed in several places:

. dplyr::near() is another option for testing if two vectors of floating point numbers are equal.

The function has a built in tolerance parameter: tol = .Machine$double.eps^0.5 that can be adjusted. The default parameter is the same as the default for all.equal().

Sreeram Nair
  • 2,369
  • 12
  • 27
  • 2
    Thanks for the response. At the moment I think this is too minimal to be an accepted answer. It doesn't seem to address the two main questions from the post. For example it states "it's determined by the needs of your program". It would be nice to show one or two examples of this statement - maybe a small program and how tolerance can be determined by it. Maybe using one of the mentioned R scripts. Also `all.equal()` has it's own assumption as default tolerance there is `sqrt(double.eps)` - why is it the default? Is it a good rule of thumb to use `sqrt()`? – Karolis Koncevičius Dec 17 '19 at 19:46
  • Here is [the code R uses to calculate eps](http://svn.r-project.org/R/trunk/src/main/platform.c) (extracted into its own program). Also I have updated the Answer with numerous discussion points which I had gone through before. Hope the same helps you understand better. – Sreeram Nair Dec 18 '19 at 05:54
  • 1
    A sincere +1 for all the effort. But at the current state I still cannot accept the answer. It seems to bit a bit out-reaching with a lot of references, but in terms of actual answer to the 2 posted questions: 1) how to understand 100x, 50x, etc modifiers in R `stats::` source, and 2) what are the guidelines; the answer is quite thin. The only applicable sentence seems to be the reference from "Numerical Recipes" about sqrt() being a good default, which is really on point, I feel. Or maybe I am missing something here. – Karolis Koncevičius Dec 18 '19 at 21:09
6

The machine precision for double depends on its current value. .Machine$double.eps gives the precision when the values is 1. You can use the C function nextAfter to get the machine precision for other values.

library(Rcpp)
cppFunction("double getPrec(double x) {
  return nextafter(x, std::numeric_limits<double>::infinity()) - x;}")

(pr <- getPrec(1))
#[1] 2.220446e-16
1 + pr == 1
#[1] FALSE
1 + pr/2 == 1
#[1] TRUE
1 + (pr/2 + getPrec(pr/2)) == 1
#[1] FALSE
1 + pr/2 + pr/2 == 1
#[1] TRUE
pr/2 + pr/2 + 1 == 1
#[1] FALSE

Adding value a to value b will not change b when a is <= half of it's machine precision. Checking if the difference is smaler than machine precision is done with <. The modifiers might consider typical cases how often an addition did not show a change.

In R the machine precision can be estimated with:

getPrecR <- function(x) {
  y <- log2(pmax(.Machine$double.xmin, abs(x)))
  ifelse(x < 0 & floor(y) == y, 2^(y-1), 2^floor(y)) * .Machine$double.eps
}
getPrecR(1)
#[1] 2.220446e-16

Each double value is representing a range. For a simple addition, the range of the result depends on the reange of each summand and also the range of their sum.

library(Rcpp)
cppFunction("std::vector<double> getRange(double x) {return std::vector<double>{
   (nextafter(x, -std::numeric_limits<double>::infinity()) - x)/2.
 , (nextafter(x, std::numeric_limits<double>::infinity()) - x)/2.};}")

x <- 2^54 - 2
getRange(x)
#[1] -1  1
y <- 4.1
getRange(y)
#[1] -4.440892e-16  4.440892e-16
z <- x + y
getRange(z)
#[1] -2  2
z - x - y #Should be 0
#[1] 1.9

2^54 - 2.9 + 4.1 - (2^54 + 5.9) #Should be -4.7
#[1] 0
2^54 - 2.9 == 2^54 - 2      #Gain 0.9
2^54 - 2 + 4.1 == 2^54 + 4  #Gain 1.9
2^54 + 5.9 == 2^54 + 4      #Gain 1.9

For higher precission Rmpfr could be used.

library(Rmpfr)
mpfr("2", 1024L)^54 - 2.9 + 4.1 - (mpfr("2", 1024L)^54 + 5.9)
#[1] -4.700000000000000621724893790087662637233734130859375

In case it could be converted to integer gmp could be used (what is in Rmpfr).

library(gmp)
as.bigz("2")^54 * 10 - 29 + 41 - (as.bigz("2")^54 * 10 + 59)
#[1] -47
GKi
  • 37,245
  • 2
  • 26
  • 48
  • Thanks a lot. I feel like this is a much better answer. It illustrates a lot of the points nicely. The only thing that is still a bit unclear to me is - can one come up with the modifiers (like *9, etc) on his/her own? And if so how... – Karolis Koncevičius Jan 22 '20 at 14:00
  • I think this modifier is like the significance level in statistics and will increase by the number of operations you have done in combination by the chosen risk to reject a correct comparison. – GKi Jan 22 '20 at 14:11