Floating point issue when using %in%

Question

I'm having difficulties in using %in% when dealing with floating point issue, e.g.

> x = seq(0.05, 0.3, 0.01)
> x %in% seq(0.15, 0.3, 0.01)
 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
[25] FALSE  TRUE

I know it is because how computer stores floating points, but is there a function like dplyr::near which could be used to replace %in%? dplyr::near(x, y) won't work if length of x is different from y.

Many thanks!

Does this answer your question? [Why are these numbers not equal?](https://stackoverflow.com/questions/9508518/why-are-these-numbers-not-equal) — camille, Apr 10 '20 at 03:45
Based on the link I posted, try `sprintf("%.20f", x)`. On my computer, one of the values I expected to come back as true that came back false was 0.29; looking at all those decimal places, it's actually being treated as 0.28999999999999998002 — camille, Apr 10 '20 at 03:48
@camille The OP seems to already understand why its code is not working, due to floating point imprecision. The question here is how to workaround this problem. — Tim Biegeleisen, Apr 10 '20 at 03:50
@camille Unfortunately nope. I know it's due to how machine stores decimals in binary format, but in that question when comparing vectors x and y, their lengths are assumed to be equal. I'm asking for x %in% y when lengths are different. — Min, Apr 10 '20 at 03:56

score 1 · Accepted Answer · answered Apr 10 '20 at 03:45

1

Using floats rounded to two decimal places seems to work:

x <- round(seq(0.05, 0.3, 0.01), 2)
x %in% round(seq(0.15, 0.3, 0.01), 2)

                                                                 ^^ 0.15
 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE
[13]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
[25]  TRUE  TRUE   <-- 0.3

answered Apr 10 '20 at 03:45

Tim Biegeleisen

502,043
27
286
360

Brilliant solution! How could I miss that! Much appreciated Tim – Min Apr 10 '20 at 03:57

score 1 · Answer 2 · answered Apr 10 '20 at 03:46

You could use dplyr::near here but since near does pairwise comparison and you need to compare with any value in vector use sapply.

check_values <- seq(0.15, 0.3, 0.01)
sapply(x, function(x) any(dplyr::near(x, check_values)))

#[1]  FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE
#[13]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
#[25]  TRUE  TRUE

jay.sf · Answer 3 · 2020-04-10T04:40:09.593

Transforming as.character.

as.character(x) %in% as.character(seq(0.15, 0.3, 0.01))
# [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# [10] FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
# [19]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE

This also seems to work fin for more complicated cases. Consider:

x <- c(.2999, .3, .2499, .25)
y <- c(.299, .3, .249, .25)

as.character(x) %in% as.character(y)
# [1] FALSE  TRUE FALSE  TRUE

When rounding, we need to calculate the digits correctly to generalize,

round(x, 3) %in% round(y, 3)
# [1] TRUE TRUE TRUE TRUE
round(x, 4) %in% round(y, 4)
# [1] FALSE  TRUE FALSE  TRUE

which can be automated:

d <- max(nchar(c(x, y))) - 2
round(x, d) %in% round(y, d)
# [1] FALSE  TRUE FALSE  TRUE

We could wrap both solutions into a function:

`%in2%` <- function(x, y) {
  d <- max(nchar(c(x, y))) - 2
  round(x, d) %in% round(y, d)
}
`%in3%` <- function(x, y) {
  as.character(x) %in% as.character(y)
}
x %in2% y
# [1] FALSE  TRUE FALSE  TRUE
x %in3% y
# [1] FALSE  TRUE FALSE  TRUE

This is just...marvelous! I hope that I could accept 2 answers - Thanks Jay! — Min, Apr 10 '20 at 04:04

Floating point issue when using %in%

3 Answers3