How to find occurrences of non-fraction values in a column in R?

Question

This should be easy, but I couldn't find any way to do this. All I want to do is find the rows with non-fraction values in y. I have data like so:

> df <- data.frame(id = c("a", "a", "a", "a", "a", "b", "b", "b", "b"),
                 y = c(1.000, 2.000, 3.000, 4.000, 5.345, 1.000, 2.000, 3.000, 4.670))
> df
id      y
 a  1.000
 a  2.000
 a  3.000
 a  4.000
 a  5.345
 b  1.000
 b  2.000
 b  3.000
 b  4.670

Output required:

id      y
 a  5.345
 b  4.670

A tidyverse method would be preferred. Thanks!

Relevant to [this post](https://stackoverflow.com/q/3476782/10068985) — Darren Tsai, Jul 18 '23 at 08:53
A benchmark comparing `filter(!near(round(y), y))` and `filter(df, y %% 1 != 0)` would be interesting, I suspect the modulo approach to be more computationally intensive. — mhovd, Jul 18 '23 at 08:56
I did some benchmarking with `microbenchmark(filter(df, !near(round(y), y)), filter(df, y %% 1 != 0))`. With as few values as in the example, the results are insignificantly different. When using a data frame with a million rows, the modulo approach is considerably faster. Using `near` however addresses the floating point issue. — Chr, Jul 18 '23 at 09:11
@mhovd It's unfair to compare `filter(df, !near(round(y), y))` and `filter(df, y %% 1 != 0)`. Why I suggest using `near()` is to deal with the floating point issue, but `y %% 1 != 0` cannot. If you want to compare, you should do it against `filter(df, !near(y %% 1, 0))`. — Darren Tsai, Jul 18 '23 at 10:30

Chr · Accepted Answer · 2023-07-18T09:15:44.513

3

Filter with modulo:

filter(df, !near(y %% 1, 0))

edited Jul 18 '23 at 09:15

answered Jul 18 '23 at 08:50

Chr

1,017
1
8
29

Could you elaborate on what you mean by "be careful with how numerics are stored"? – Cloft X Jul 18 '23 at 08:53
1

Numerics are double precision floating point numbers. Unlike integers, the computer does not necessarily store a floating point number as exactly the decimal value that you entered, but a value very close to it. Here is a funny video on that: https://youtube.com/shorts/s9F8pu5KfyM?feature=share. So when applying the modulo to a "non-fraction" value, the result is not necessarily exactly 0, but might be a number very close to 0. – Chr Jul 18 '23 at 09:01
2

I changed my answer based on Darren Tsai suggestion of using `near`. That makes it computationally more expensive, but averts the floating point precision issue. – Chr Jul 18 '23 at 09:17
1

I'm not sure if that can really happen in this case though – moodymudskipper Jul 18 '23 at 15:58
1

Also I assume this is using dplyr but then you should mention it, especially as base R has a different function named `filter()` – moodymudskipper Jul 18 '23 at 16:00
1

I did not mention that this uses dplyr because the question specifically asked for a tidyverse solution. – Chr Jul 18 '23 at 21:13

score 1 · Answer 2 · answered Jul 19 '23 at 06:47

1

Probably as.character could play a trick here

> subset(df, grepl(".", as.character(y), fixed = TRUE))
  id     y
5  a 5.345
9  b 4.670

answered Jul 19 '23 at 06:47

ThomasIsCoding

96,636
9
24
81

How to find occurrences of non-fraction values in a column in R?

2 Answers2