2

This should be easy, but I couldn't find any way to do this. All I want to do is find the rows with non-fraction values in y. I have data like so:

> df <- data.frame(id = c("a", "a", "a", "a", "a", "b", "b", "b", "b"),
                 y = c(1.000, 2.000, 3.000, 4.000, 5.345, 1.000, 2.000, 3.000, 4.670))
> df
id      y
 a  1.000
 a  2.000
 a  3.000
 a  4.000
 a  5.345
 b  1.000
 b  2.000
 b  3.000
 b  4.670

Output required:

id      y
 a  5.345
 b  4.670

A tidyverse method would be preferred. Thanks!

Cloft X
  • 141
  • 7
  • 3
    Try `df %>% filter(!near(round(y), y))` – Darren Tsai Jul 18 '23 at 08:47
  • 2
    Relevant to [this post](https://stackoverflow.com/q/3476782/10068985) – Darren Tsai Jul 18 '23 at 08:53
  • 1
    A benchmark comparing `filter(!near(round(y), y))` and `filter(df, y %% 1 != 0)` would be interesting, I suspect the modulo approach to be more computationally intensive. – mhovd Jul 18 '23 at 08:56
  • 1
    I did some benchmarking with `microbenchmark(filter(df, !near(round(y), y)), filter(df, y %% 1 != 0))`. With as few values as in the example, the results are insignificantly different. When using a data frame with a million rows, the modulo approach is considerably faster. Using `near` however addresses the floating point issue. – Chr Jul 18 '23 at 09:11
  • 1
    @mhovd It's unfair to compare `filter(df, !near(round(y), y))` and `filter(df, y %% 1 != 0)`. Why I suggest using `near()` is to deal with the floating point issue, but `y %% 1 != 0` cannot. If you want to compare, you should do it against `filter(df, !near(y %% 1, 0))`. – Darren Tsai Jul 18 '23 at 10:30
  • 2
    see also `rlang::is_integerish()` – moodymudskipper Jul 18 '23 at 15:55

2 Answers2

3

Filter with modulo:

filter(df, !near(y %% 1, 0))
Chr
  • 1,017
  • 1
  • 8
  • 29
  • Could you elaborate on what you mean by "be careful with how numerics are stored"? – Cloft X Jul 18 '23 at 08:53
  • 1
    Numerics are double precision floating point numbers. Unlike integers, the computer does not necessarily store a floating point number as exactly the decimal value that you entered, but a value very close to it. Here is a funny video on that: https://youtube.com/shorts/s9F8pu5KfyM?feature=share. So when applying the modulo to a "non-fraction" value, the result is not necessarily exactly 0, but might be a number very close to 0. – Chr Jul 18 '23 at 09:01
  • 2
    I changed my answer based on Darren Tsai suggestion of using `near`. That makes it computationally more expensive, but averts the floating point precision issue. – Chr Jul 18 '23 at 09:17
  • 1
    I'm not sure if that can really happen in this case though – moodymudskipper Jul 18 '23 at 15:58
  • 1
    Also I assume this is using dplyr but then you should mention it, especially as base R has a different function named `filter()` – moodymudskipper Jul 18 '23 at 16:00
  • 1
    I did not mention that this uses dplyr because the question specifically asked for a tidyverse solution. – Chr Jul 18 '23 at 21:13
1

Probably as.character could play a trick here

> subset(df, grepl(".", as.character(y), fixed = TRUE))
  id     y
5  a 5.345
9  b 4.670
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81