I have the following data frame:
> df1 <- data.frame("valA" = c(1,1,1,1,2,1,3,3,3), "valB" = c(1,2,3,1,2,3,1,2,3), "Score" = c(100,90,80,100, 60,80,10,20,30))
> df1
valA valB Score
1 1 1 100
2 1 2 90
3 1 3 80
4 1 1 100
5 2 2 60
6 1 3 80
7 3 1 10
8 3 2 20
9 3 3 30
And I want the duplicated value (the expected result is):
valA valB Score
1 1 1 100
2 1 3 80
3 1 1 100
4 1 3 80
I know that there is code to take unique value in dplyr::distinct
, but I need to know which rows are duplicated, not removing the duplicate from the data frame. And I tried R base duplicated
function, but it's too slow since my data is large (more than 20 million row). I also tried:
duplicated_df1 <- df1 %>% group_by(valA, valB, Score) %>% filter(n() > 1)
which can lead to the expected result above, but again, it's too slow and I don't have enough RAM. Can anyone please suggest me efficient and fast method to find the duplicated row?