0

I have a data.table which I want to filter to find only rows of data where fresh_flow==0.187677777777778.

df<-data.table::as.data.table(fresh_flow=as.numeric(rep(c(0.59347556, 0.05940667, 0.18767778, 1.87677778, 0.01876778),each=1e2)))

df%>%
dplyr::filter(fresh_flow==0.187677777777778)

Returns a result as expected. With my data it returns an empty table even though the column type is also numeric and contains the same unique values. Why?

df$fres_flow %>% str()
Classes ‘data.table’ and 'data.frame':  2000 obs. of  1 variable:
 $ fresh_flow: num  0.188 0.188 0.188 0.188 0.188 ...
 - attr(*, ".internal.selfref")=<externalptr> 

    MyData$fresh_flow %>% str()
     num [1:10499995] 0.5935 0.5935 0.0594 0.0594 0.0594
 
unique(MyData$fresh_flow)
    [1] 0.59347556 0.05940667 0.18767778 1.87677778 0.01876778
Dharman
  • 30,962
  • 25
  • 85
  • 135
HCAI
  • 2,213
  • 8
  • 33
  • 65
  • 1
    `0.01876778` & `0.187677777777778` are not exactly the same, are they? – Sinh Nguyen Feb 16 '21 at 13:42
  • Computers have limitations when it comes to floating-point numbers (aka `double`, `numeric`, `float`). This is a fundamental limitation of computers in general, in how they deal with non-integer numbers. This is not specific to any one programming language. There are some add-on libraries or packages that are much better at arbitrary-precision math, but I believe most main-stream languages (this is relative/subjective, I admit) do not use these by default. Refs: https://stackoverflow.com/q/9508518, https://stackoverflow.com/q/588004, and https://en.wikipedia.org/wiki/IEEE_754 – r2evans Feb 16 '21 at 13:46
  • 1
    If you want to find that number, you need to look for numbers that are close *with tolerance*, such as `abs(fresh_flow - 0.187677778) < 1e-13` or similar. – r2evans Feb 16 '21 at 13:46
  • That is true. How could you build into this a rounding such that it doesn't matter which value you choose as long as the first three digits were the same without altering the data? – HCAI Feb 16 '21 at 13:47
  • @r2evans thank you that makes sense. How would you put that inequality into the filter function? – HCAI Feb 16 '21 at 13:50
  • 1
    Just wrap it? `dplyr::filter(abs(a-b) – r2evans Feb 16 '21 at 13:50

0 Answers0