Numeric comparison difficulty in R

Question

I'm trying to compare two numbers in R as a part of a if-statement condition:

(a-b) >= 0.5

In this particular instance, a = 0.58 and b = 0.08... and yet (a-b) >= 0.5 is false. I'm aware of the dangers of using == for exact number comparisons, and this seems related:

(a - b) == 0.5) is false, while

all.equal((a - b), 0.5) is true.

The only solution I can think of is to have two conditions: (a-b) > 0.5 | all.equal((a-b), 0.5). This works, but is that really the only solution? Should I just swear off of the = family of comparison operators forever?

Edit for clarity: I know that this is a floating point problem. More fundamentally, what I'm asking is: what should I do about it? What's a sensible way to deal with greater-than-or-equal-to comparisons in R, since the >= can't really be trusted?

Your solution based on two conditions `(a-b) > 0.5 | all.equal((a-b), 0.5)` are wrong in many cases and hence cannot/mustn't be used: `a <- 4.005; b <- 4.002; a-b > 0.5 | all.equal(a-b, 0.5) # Error in a - b > 0.5 | all.equal(a - b, 0.5) : operations are possible only for numeric, logical or complex types`. Because: `all.equal` produces logical OR CHARACTER. Therefore, "logical | character" type incompatiblity occurs. I'll show how to prevent this type incompatibality. — Erdogan CEVHER, May 02 '19 at 21:34

John · Accepted Answer · 2015-05-19T19:44:23.703

49

I've never been a fan of all.equal for such things. It seems to me the tolerance works in mysterious ways sometimes. Why not just check for something greater than a tolerance less than 0.05

tol = 1e-5

(a-b) >= (0.05-tol)

In general, without rounding and with just conventional logic I find straight logic better than all.equal

If x == y then x-y == 0. Perhaps x-y is not exactly 0 so for such cases I use

abs(x-y) <= tol

You have to set tolerance anyway for all.equal and this is more compact and straightforward than all.equal.

edited May 19 '15 at 19:44

answered May 05 '10 at 00:25

John

23,360
7
57
83

What is wrong with `all.equal`? It uses [a fairly sensible default value](https://stackoverflow.com/a/9508558) -- the square root of eps. Yes, larger tolerances are sometimes needed, but you can specify those as well. – Josiah Yoder Aug 03 '21 at 19:17
Perhaps because it also checks that the types match? I found myself using `isTRUE(all.equal(as.numeric(v2), as.numeric(v2)))`. – Josiah Yoder Aug 03 '21 at 21:13

Shane · Answer 2 · 2010-05-05T00:48:54.930

15

You could create this as a separate operator or overwrite the original >= function (probably not a good idea) if you want to use this approach frequently:

# using a tolerance
epsilon <- 1e-10 # set this as a global setting
`%>=%` <- function(x, y) (x + epsilon > y)

# as a new operator with the original approach
`%>=%` <- function(x, y) (all.equal(x, y)==TRUE | (x > y))

# overwriting R's version (not advised)
`>=` <- function(x, y) (isTRUE(all.equal(x, y)) | (x > y))

> (a-b) >= 0.5
[1] TRUE
> c(1,3,5) >= 2:4
[1] FALSE FALSE  TRUE

edited May 05 '10 at 00:48

answered May 04 '10 at 23:21

Shane

98,550
35
224
217

3

Personally I think this is the best approach, because you don't have to decide on the epsilon yourself. You could even take a page from Perl, and give them names like `ge`, `le`, and `ne`. – Ken Williams May 05 '10 at 18:15

score 13 · Answer 3 · answered May 05 '10 at 00:12

13

For completeness' sake, I'll point out that, in certain situations, you could simply round to a few decimal places (and this is kind of a lame solution by comparison to the better solution previously posted.)

round(0.58 - 0.08, 2) == 0.5

answered May 05 '10 at 00:12

icio

3,060
20
22

5

I think it's best solution and for original problem I will use `round(a-b, 10) >= 0.5` (10 digits should be enough for future extends). – Marek May 05 '10 at 11:53

score 8 · Answer 4 · answered Apr 19 '17 at 07:44

One more comment. The all.equal is a generic. For numeric values, it uses all.equal.numeric. An inspection of this function shows that it used .Machine$double.eps^0.5, where .Machine$double.eps is defined as

double.eps: the smallest positive floating-point number ‘x’ such that
          ‘1 + x != 1’.  It equals ‘double.base ^ ulp.digits’ if either
          ‘double.base’ is 2 or ‘double.rounding’ is 0; otherwise, it
          is ‘(double.base ^ double.ulp.digits) / 2’.  Normally
          ‘2.220446e-16’.

(.Machine manual page).

In other words, that would be an acceptable choice for your tolerance:

myeq <- function(a, b, tol=.Machine$double.eps^0.5)
      abs(a - b) <= tol

score 4 · Answer 5 · answered May 05 '10 at 00:25

4

Choose some tolerance level:

epsilon <- 1e-10

Then use

(a-b+epsilon) >= 0.5

answered May 05 '10 at 00:25

Rob Hyndman

30,301
7
73
85

score 3 · Answer 6 · answered May 05 '10 at 21:39

But, if your using tolerances anyway, why do you care that a-b == .5 (in fact) doesn't get evaluated? If you are using tolerances anyway you are saying I don't care about the end points exactly.

Here is what is true if( (a-b) >= .5) if( (a-b) < .5)

one of those should always evaluate true on every pair of doubles. Any code that uses one implicitly defines a no operation on the other one, at least. If your using tolerances to get actual .5 included in the first but your problem is defined on a continuous domain you arn't accomplishing much. In most problems involving continuous values in the underlying problem there will be very little point to that, since values arbitrarily over .5 will always evaluate as they should. Values arbitrarily close to .5 will go to the "wrong" flow control, but in continuous problems where you are using appropriate precision that doesn't matter.

The only time that tolerances make sense is when you are dealing with problems of the type if( (a-b) == c) if( (a-b) != c)

Here no amount of "appropriate precision" can help you. The reason is that you have to be prepared that the second will always evaluate to true unless you set the bits of a-b at a very low level by hand, when in fact you probably want the first to sometimes be true.

Erdogan CEVHER · Answer 7 · 2019-05-20T15:56:30.057

<= and >= comparisons are not language specific when numerical difficulty is raised in floating point numbers.

IsSmallerOrEqual <- function(a,b) {   # To check a <= b
# Check whether "Mean relative difference..." exist in all.equal's result; 
# If exists, it results in character, not logical
if (   class(all.equal(a, b)) == "logical" && (a<b | all.equal(a, b))) { return(TRUE)
 } else if (a < b) { return(TRUE)
     } else { return(FALSE) }
}

IsSmallerOrEqual(abs(-2-(-2.2)), 0.2) # TRUE; To check |-2-(-2.2)| <= 0.2
IsSmallerOrEqual(abs(-2-(-2.2)), 0.3) # TRUE
IsSmallerOrEqual(abs(-2-(-2.2)), 0.1) # FALSE

IsBiggerOrEqual  <- function(a,b) {   # To check a >= b
# Check whether "Mean relative difference..." exist in all.equal's result; 
# If exists, it results in character, not logical
if (   class(all.equal(a, b)) == "logical" && (a>b | all.equal(a, b))) { return(TRUE)
 } else if (a > b) { return(TRUE)
     } else { return(FALSE) }
}
IsBiggerOrEqual(3,3) # TRUE
IsBiggerOrEqual(4,3) # TRUE
IsBiggerOrEqual(3,4) # FALSE
IsBiggerOrEqual(0.58 - 0.08,0.5)  # TRUE

If all.equal is not processed, we may encounter errors.

The following is not necessary but useful:

abs(-2-(-2.2)) # 0.2

sprintf("%.54f",abs(-2-(-2.2)))  # "0.200000000000000177635683940025046467781066894531250000"
sprintf("%.54f",0.2)             # "0.200000000000000011102230246251565404236316680908203125"

all.equal(abs(-2-(-2.2)), 0.2)  # TRUE; check nearly equivalence of floating point numbers
identical(abs(-2-(-2.2)), 0.2)  # FALSE; check exact equivalence of floating point numbers

score 0 · Answer 8 · answered Jul 11 '23 at 15:03

In case it helps anyone I've recently started using these helpers.

Not sure they're entirely correct though so happy for any examples of where these might not work.

double_equal <- function(x, y, tol = sqrt(.Machine$double.eps)){
  abs(x - y) < tol
}
double_gt <- function(x, y, tol = sqrt(.Machine$double.eps)){
  (x - y) > tol
}
double_gte <- function(x, y, tol = sqrt(.Machine$double.eps)){
  (x - y) > -tol
}
double_lt <- function(x, y, tol = sqrt(.Machine$double.eps)){
  (x - y) < -tol
}
double_lte <- function(x, y, tol = sqrt(.Machine$double.eps)){
  (x - y) < tol
}

x <- sqrt(2)^2
y <- 2
double_gte(x, y)
#> [1] TRUE
double_gte(y, x)
#> [1] TRUE

double_gt(x, y)
#> [1] FALSE
double_gt(y, x)
#> [1] FALSE

double_lte(x, y)
#> [1] TRUE
double_lte(y, x)
#> [1] TRUE

double_lt(x, y)
#> [1] FALSE
double_lt(y, x)
#> [1] FALSE

Numeric comparison difficulty in R

8 Answers8

Linked

Related