1

I have the following data:

df <- data.frame(id = c("001", "001", "001", "002", "002", "003", "003"),
                 x = c(0, 0, 0, 0, 1, 0, 1))

 id x
001 0
001 0
001 0
002 0
002 1
003 0
003 1

The nature of the data is such that it is possible for some id to only have x = 0 rows. In the case where x = 1 for a given id, it only occurs once, and that too in the last row for that id. I want to remove duplicate rows for each id, but in case x = 1 for an id, I want to keep only that row.

The desired output:

 id x
001 0
002 1
003 1

A tidyverse solution is preferable. Thanks!

Cloft X
  • 141
  • 7

2 Answers2

5

in base R you could use aggregate function:

aggregate(x ~ id, df, max)
   id x
1 001 0
2 002 1
3 003 1
Onyambu
  • 67,392
  • 3
  • 24
  • 53
4

Probably slice_max

df %>%
    slice_max(x, by = id) %>%
    distinct()

or (as comments from @r2evans)

df %>%
    slice_max(x, by = id, with_ties = FALSE)

which gives

   id x
1 001 0
2 002 1
3 003 1
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81