3

I'm having difficulties in using %in% when dealing with floating point issue, e.g.

> x = seq(0.05, 0.3, 0.01)
> x %in% seq(0.15, 0.3, 0.01)
 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
[25] FALSE  TRUE

I know it is because how computer stores floating points, but is there a function like dplyr::near which could be used to replace %in%? dplyr::near(x, y) won't work if length of x is different from y.

Many thanks!

Min
  • 179
  • 9
  • Does this answer your question? [Why are these numbers not equal?](https://stackoverflow.com/questions/9508518/why-are-these-numbers-not-equal) – camille Apr 10 '20 at 03:45
  • Based on the link I posted, try `sprintf("%.20f", x)`. On my computer, one of the values I expected to come back as true that came back false was 0.29; looking at all those decimal places, it's actually being treated as 0.28999999999999998002 – camille Apr 10 '20 at 03:48
  • 4
    @camille The OP seems to already understand why its code is not working, due to floating point imprecision. The question here is how to workaround this problem. – Tim Biegeleisen Apr 10 '20 at 03:50
  • @camille Unfortunately nope. I know it's due to how machine stores decimals in binary format, but in that question when comparing vectors x and y, their lengths are assumed to be equal. I'm asking for x %in% y when lengths are different. – Min Apr 10 '20 at 03:56

3 Answers3

1

Using floats rounded to two decimal places seems to work:

x <- round(seq(0.05, 0.3, 0.01), 2)
x %in% round(seq(0.15, 0.3, 0.01), 2)

                                                                 ^^ 0.15
 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE
[13]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
[25]  TRUE  TRUE   <-- 0.3
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
1

You could use dplyr::near here but since near does pairwise comparison and you need to compare with any value in vector use sapply.

check_values <- seq(0.15, 0.3, 0.01)
sapply(x, function(x) any(dplyr::near(x, check_values)))

#[1]  FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE
#[13]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
#[25]  TRUE  TRUE
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
1

Transforming as.character.

as.character(x) %in% as.character(seq(0.15, 0.3, 0.01))
# [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# [10] FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
# [19]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE

This also seems to work fin for more complicated cases. Consider:

x <- c(.2999, .3, .2499, .25)
y <- c(.299, .3, .249, .25)

as.character(x) %in% as.character(y)
# [1] FALSE  TRUE FALSE  TRUE

When rounding, we need to calculate the digits correctly to generalize,

round(x, 3) %in% round(y, 3)
# [1] TRUE TRUE TRUE TRUE
round(x, 4) %in% round(y, 4)
# [1] FALSE  TRUE FALSE  TRUE

which can be automated:

d <- max(nchar(c(x, y))) - 2
round(x, d) %in% round(y, d)
# [1] FALSE  TRUE FALSE  TRUE

We could wrap both solutions into a function:

`%in2%` <- function(x, y) {
  d <- max(nchar(c(x, y))) - 2
  round(x, d) %in% round(y, d)
}
`%in3%` <- function(x, y) {
  as.character(x) %in% as.character(y)
}
x %in2% y
# [1] FALSE  TRUE FALSE  TRUE
x %in3% y
# [1] FALSE  TRUE FALSE  TRUE
jay.sf
  • 60,139
  • 8
  • 53
  • 110