How to remove duplicate rows in R based on condition?

Question

I have the following data:

df <- data.frame(id = c("001", "001", "001", "002", "002", "003", "003"),
                 x = c(0, 0, 0, 0, 1, 0, 1))

 id x
001 0
001 0
001 0
002 0
002 1
003 0
003 1

The nature of the data is such that it is possible for some id to only have x = 0 rows. In the case where x = 1 for a given id, it only occurs once, and that too in the last row for that id. I want to remove duplicate rows for each id, but in case x = 1 for an id, I want to keep only that row.

The desired output:

A tidyverse solution is preferable. Thanks!

Perhaps the title of my question could be edited to make it more helpful to locate. — Cloft X, Aug 04 '23 at 15:25

score 5 · Answer 1 · answered Aug 04 '23 at 15:18

5

in base R you could use aggregate function:

aggregate(x ~ id, df, max)
   id x
1 001 0
2 002 1
3 003 1

answered Aug 04 '23 at 15:18

Onyambu

67,392
3
24
53

ThomasIsCoding · Accepted Answer · 2023-08-04T15:14:13.590

4

Probably slice_max

df %>%
    slice_max(x, by = id) %>%
    distinct()

or (as comments from @r2evans)

df %>%
    slice_max(x, by = id, with_ties = FALSE)

which gives

edited Aug 04 '23 at 15:14

answered Aug 04 '23 at 15:11

ThomasIsCoding

96,636
9
24
81

3

You can use `with_ties=FALSE` for a simpler `slice_max(df, x, by = id, with_ties = FALSE)` – r2evans Aug 04 '23 at 15:12
2

@r2evans Thanks for the contribution! Yes, that's more concise! – ThomasIsCoding Aug 04 '23 at 15:14

How to remove duplicate rows in R based on condition?

2 Answers2