10

I am reading through The R Inferno, and have run into something I do not understand. In addition to section 8.2.23 in the Inferno, there have been some good questions on comparing floating point numbers: question1, question2.

However, I am still running into a problem using all.equal. Using the default all.equal I get the results (mostly) as I would expect.

> all.equal(2,1.99999997)
[1] "Mean relative difference: 1.5e-08"
> all.equal(2,1.99999998) #I expected FALSE here
[1] TRUE
> all.equal(2,1.99999999)
[1] TRUE

I am not sure why at 1.99999998 the function returns TRUE, but that is not as concerning as the following behavior where I specified the tolerance level:

> all.equal(2,1.98,tolerance=0.01) #Behaves as expected
[1] "Mean relative difference: 0.01"
> all.equal(2,1.981,tolerance=0.01) #Does not behave as expected
[1] TRUE

Furthermore,

> all.equal(2,1.980000000001,tolerance=0.01)
[1] TRUE 

But if we compute:

> diff(c(1.981,2))
[1] 0.019

and clearly,

> diff(c(1.981,2)) >= 0.01
[1] TRUE

So, why is all.equal unable to distinguish 2 and 1.981 with a tolerance of 0.01?

EDIT

From the documentation: Numerical comparisons for scale = NULL (the default) are done by first computing the mean absolute difference of the two numerical vectors. If this is smaller than tolerance or not finite, absolute differences are used, otherwise relative differences scaled by the mean absolute difference.

Here I do not understand the behavior. I can see that diff(1.981,2) is not finite:

> sprintf("%.25f",diff(c(1.981,2)))
[1] "0.0189999999999999058530875"

But then what does it get scaled by? When each vector is of length one, the mean absolute difference should equal the difference of the two numbers, and dividing by the mean absolute difference would give 1. Clearly, I am understanding the logic here wrong.

Community
  • 1
  • 1
dayne
  • 7,504
  • 6
  • 38
  • 56
  • from the help file: `all.equal(x, y)` is a utility to compare R objects x and y testing ‘near equality’ (note: near ) – Metrics Sep 13 '13 at 19:59
  • `is.finite` indicates which elements are finite (not infinite and not missing) or infinite (as in `Inf` and `-Inf`, not repeating decimals! – Simon O'Hanlon Sep 13 '13 at 21:49

2 Answers2

7

This has to do with floating point accuracy. The manual isn't entirely clear at first glance, but in your example the mean absolute difference of 2-1.981 is 0.019 which is > 0.01, the tolerance. scale is also NULL. Therefore the comparison made is the relative difference scaled by the mean absolute difference. Eh?!

Using tolerance implies that you care about the magnitude of the numbers involved. Relative difference accounts for not how big the difference is (absolute terms), but how great it is, relative to the numbers being compared. Given the example in the link, the difference between 5 and 6 is more significant (I use the term loosely) than between 1,000,000,000 and 1,000,000,001.

So if the relative difference between the two numbers is less than tolerance the numbers are considered equal. For two single numbers (as in this example) the relative difference is given by:

( current - target ) / current

Which is

( 2 - 1.981 ) / 2 == 0.0095

The tolerance you specified is 0.01 therefore the numbers are considered equal because the relative difference is less than this. The difference between these numbers ± the relative difference also just happens to be the smallest representable floating point number!

identical( abs( ( 2 - 0.0095 ) - ( 1.981 + 0.0095 ) ) , .Machine$double.eps )
[1] TRUE

Now try:

all.equal( 2 , 1.981 , 0.00949999999999 )
[1] "Mean relative difference: 0.0095"
Simon O'Hanlon
  • 58,647
  • 14
  • 142
  • 184
6

This happens because in this case all.equal checks relative differences. If you set scale=1, i.e. no scaling, absolute comparisons will be made and all.equal behaves as you are expecting.

For further details see the documentation on the scale parameter.

> all.equal(2,1.980000000001,tolerance=0.01)
[1] TRUE
> all.equal(2,1.980000000001,tolerance=0.01,scale=1)
[1] "Mean scaled difference: 0.02"
ROLO
  • 4,183
  • 25
  • 41
  • Thanks for the clarification about scale. However, I am still stumped by the default behavior with `scale = NULL`. – dayne Sep 13 '13 at 20:25
  • "Numerical comparisons for scale = NULL (the default) are done by first computing the mean absolute difference of the two numerical vectors. If this is smaller than tolerance or not finite, absolute differences are used, otherwise relative differences scaled by the mean absolute difference." (documentation) -- OK, actually I don't get it either. – ROLO Sep 13 '13 at 20:28