0

I have a large dataframe (698764 X 9) that looks similar in format to:

df <- data.frame(id = c(1, 2, 1, 4), units = c(2, 2, 2, 5), region = c("US", "CA", "US", "IN))

As we can see the first and third row are exactly the same. I would like to extract the duplicated row and then count how many times it was duplicated in the data so that the output would look like

duplicates <- data.frame(id = 1, units = 2, region = "US", times = 2)

where "times" is the number of times the row is duplicated.

I extracted the duplicated rows using

new_df <- df[duplicated(df),]

but I am not sure how to count the number of occurences.

Jane Miller
  • 153
  • 9

1 Answers1

1

We can use count

library(dplyr)
df %>% 
    count(id, units, region, name = 'times')

-output

   id units region times
1  1     2     US     2
2  2     2     CA     1
3  4     5     IN     1

Or use

df %>% 
    count(across(everything()), name = 'times')
  id units region times
1  1     2     US     2
2  2     2     CA     1
3  4     5     IN     1
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Would I be able to substitute 1:3 for all 3 columns somehow instead of the column names? Given that I have multiple dataframes with about 9 columns each that I would like to look at so listing the column names each time seems extensive – Jane Miller Jul 29 '21 at 20:22
  • @JaneMiller yes, you can pass that in `across` i.e. `across(1:3` or `across(everything()` as in the update – akrun Jul 29 '21 at 20:24