Deleting rows that are duplicated in one column based on value in another column

Question

A similar question was asked here. However, I did not manage to adopt that solution to my particular problem, hence the separate question.

An example dataset:

I would like to delete all rows that are duplicated in id and where group has value 998. In this example, only row 2 should be deleted.

I tried something along those lines:

df1 <- df %>%
  subset((unique(by = "id") |  group != 998))

but got

Error in is.factor(x) : Argument "x" is missing, with no default

Thank you in advance

Try replacing subset with the filter function (obviously follow the package guidelines) and use group_by(id) as Sotos just said a split second ago ha — mgd6, Jun 20 '22 at 10:30

Sotos · Accepted Answer · 2022-06-20T11:00:15.357

1

Here is an idea

library(dplyr)

df %>% 
 group_by(id) %>% 
 filter(!any(n() > 1 & group == 998))

# A tibble: 3 x 2
# Groups:   id [2]
     id group
  <int> <int>
1     2     2
2     2     3
3     3   998

In case you want to remove only the 998 entry from the group then,

df %>% 
 group_by(id) %>% 
 filter(!(n() > 1 & group == 998))

edited Jun 20 '22 at 11:00

answered Jun 20 '22 at 10:30

Sotos

51,121
6
32
66

1

I applied this solution but leaving out ´any´ (like in your comment in TarJae's solution) to both the example and my original dataset and it looks like this achieved the desired output. Thank you very much. – Calvin_Hobbes Jun 20 '22 at 10:54

score 1 · Answer 2 · answered Jun 20 '22 at 10:31

1

One way could be:

library(dplyr)

df1 <- df %>% 
  filter(duplicated(id) & group=="998") 

anti_join(df, df1)

Joining, by = c("id", "group")
  id group
1  1     5
3  2     2
4  2     3
5  3   998

answered Jun 20 '22 at 10:31

TarJae

72,363
6
19
66

1

Not sure about OPs expected output, but If this is the case you can simply do `df %>% group_by(id) %>% filter(!(n() > 1 & group == 998))`. I added `any` in my answer to remove all of the group that satisfies those criteria – Sotos Jun 20 '22 at 10:32
1

Thanks Sotos. I am also not sure. I guess we wait. – TarJae Jun 20 '22 at 10:33
1

I applied the solution as laid out in Sotos comment and it worked. Thank you very much to both of you! – Calvin_Hobbes Jun 20 '22 at 10:54
1

@Sotos please add the comment to your answer. +1 – TarJae Jun 20 '22 at 10:55

Deleting rows that are duplicated in one column based on value in another column

2 Answers2