0

If I run the following code:

library(dplyr)

data.frame(items=seq(0,1,by = .2))%>%filter(items==.4)

data.frame(items=seq(0,1,by = .2))%>%filter(items==.6)

I get results for the first filtering procedure, but an empty data frame for the second, as if it cannot find an entry that has 0.6. I was in the midst of a complex script when I first found this, so I assumed that it was something in my environment. Nope. I opened a completely new session and typed these three lines in and continued to get the error.

I have a suspicion that my software versions may be at the root of things. I am using R version: R version 4.0.2 (2020-06-22) and dplyr 1.0.3. I do get the warning message: "Warning message:package ‘dplyr’ was built under R version 4.0.3" whenever I load it in, but I really have a hard time believing that dplyr's backwards compatibility is that tenuous.

Any help greatly appreciated.

Aegis
  • 145
  • 10
  • 1
    `seq(0, 1, by = 0.2)[4] - 0.6` is not zero, it's `[1] 1.110223e-16`. – Rui Barradas Mar 05 '21 at 21:26
  • Holy cow. This is mind blowing. I need to read up on this. Thanks. – Aegis Mar 05 '21 at 21:28
  • 3
    Well, this error is so frequent that many R users know the number [FAQ 7.31](https://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f) by heart. And you hit one of the most infamous, `seq(0, 1, by = 0.2)` gets `0.6` wrong :(. – Rui Barradas Mar 05 '21 at 21:31
  • 2
    Computers have limitations when it comes to floating-point numbers (aka `double`, `numeric`, `float`). This is a fundamental limitation of computers in general, in how they deal with non-integer numbers. This is not specific to any one programming language. There are some add-on libraries or packages that are much better at arbitrary-precision math, but I believe most main-stream languages (this is relative/subjective, I admit) do not use these by default. Other refs: https://stackoverflow.com/q/588004, and https://en.wikipedia.org/wiki/IEEE_754 (if insomniac). – r2evans Mar 05 '21 at 21:34
  • 1
    Dplyr's near() function is the solution to my issue. Thanks all for putting me on the right track. – Aegis Mar 05 '21 at 22:45

0 Answers0