Trying to find how many times a row is duplicated R

Question

I have a large dataframe (698764 X 9) that looks similar in format to:

df <- data.frame(id = c(1, 2, 1, 4), units = c(2, 2, 2, 5), region = c("US", "CA", "US", "IN))

As we can see the first and third row are exactly the same. I would like to extract the duplicated row and then count how many times it was duplicated in the data so that the output would look like

duplicates <- data.frame(id = 1, units = 2, region = "US", times = 2)

where "times" is the number of times the row is duplicated.

I extracted the duplicated rows using

new_df <- df[duplicated(df),]

but I am not sure how to count the number of occurences.

akrun · Answer 1 · 2021-07-29T20:24:14.660

1

We can use count

library(dplyr)
df %>% 
    count(id, units, region, name = 'times')

-output

   id units region times
1  1     2     US     2
2  2     2     CA     1
3  4     5     IN     1

Or use

df %>% 
    count(across(everything()), name = 'times')
  id units region times
1  1     2     US     2
2  2     2     CA     1
3  4     5     IN     1

edited Jul 29 '21 at 20:24

answered Jul 29 '21 at 20:16

akrun

874,273
37
540
662

Would I be able to substitute 1:3 for all 3 columns somehow instead of the column names? Given that I have multiple dataframes with about 9 columns each that I would like to look at so listing the column names each time seems extensive – Jane Miller Jul 29 '21 at 20:22
@JaneMiller yes, you can pass that in `across` i.e. `across(1:3` or `across(everything()` as in the update – akrun Jul 29 '21 at 20:24

Trying to find how many times a row is duplicated R

1 Answers1