Find how many times duplicated rows repeat in R data frame

Question

I have a data frame like the following example

a = c(1, 1, 1, 2, 2, 3, 4, 4)
b = c(3.5, 3.5, 2.5, 2, 2, 1, 2.2, 7)
df <-data.frame(a,b)

I can remove duplicated rows from R data frame by the following code, but how can I find how many times each duplicated rows repeated? I need the result as a vector.

unique(df)

or

df[!duplicated(df), ]

This is not a duplicate of "Count number of rows within each group". This one is counting duplicates, the other question is counting how many in each group (and the rows in a group do not have to be duplicates of each other). — Eric Krantz, Feb 28 '22 at 23:38

Didzis Elferts · Accepted Answer · 2013-08-13T05:26:03.613

34

Here is solution using function ddply() from library plyr

library(plyr)
ddply(df,.(a,b),nrow)

  a   b V1
1 1 2.5  1
2 1 3.5  2
3 2 2.0  2
4 3 1.0  1
5 4 2.2  1
6 4 7.0  1

edited Aug 13 '13 at 05:26

answered Aug 13 '13 at 05:17

Didzis Elferts

95,661
14
264
201

2

You could save a few characters by replacing ``function(x) nrow(x)`` with just ``nrow``. – orizon Aug 13 '13 at 05:24
Is it at all possible to recreate this with dplyr? – maj Apr 30 '14 at 10:22
@maj I haven't used dplyr so can't answer – Didzis Elferts Apr 30 '14 at 11:55
1

is there a solution that's agnostic to the columns a,b? (ie, use all columns) – 3pitt Oct 10 '17 at 18:24

thelatemail · Answer 2 · 2016-07-26T03:04:14.733

25

You could always kill two birds with the one stone:

aggregate(list(numdup=rep(1,nrow(df))), df, length)
# or even:
aggregate(numdup ~., data=transform(df,numdup=1), length)
# or even:
aggregate(cbind(df[0],numdup=1), df, length)

  a   b numdup
1 3 1.0      1
2 2 2.0      2
3 4 2.2      1
4 1 2.5      1
5 1 3.5      2
6 4 7.0      1

edited Jul 26 '16 at 03:04

answered Aug 13 '13 at 05:20

thelatemail

91,185
12
128
188

Could you please explain the reason behind replication `aggregate(list(numdup=rep(1,nrow(df))), df, length)` ? – DukeLover Jun 02 '17 at 11:43
@dukelover - aggregate needs the column(s) being summed to be the same length as the grouping variables, so I just repeat 1 to get this. – thelatemail Jun 02 '17 at 21:51
thanks a lot for your reply. Can you please explain this code `aggregate(numdup ~., data=transform(df,numdup=1), length) ` ? -- Here what is the significance of `numdup ~` ? – DukeLover Jun 03 '17 at 04:08

score 15 · Answer 3 · answered Aug 13 '13 at 05:30

Here are two approaches.

# a example data set that is not sorted
DF <-data.frame(replicate(sequence(1:3),n=2))

# example using similar idea to duplicated.data.frame
count.duplicates <- function(DF){
x <- do.call('paste', c(DF, sep = '\r'))
  ox <- order(x)
  rl <- rle(x[ox])
  cbind(DF[ox[cumsum(rl$lengths)],,drop=FALSE],count = rl$lengths)

}
count.duplicates(DF)
#   X1 X2 count
# 4  1  1     3
# 5  2  2     2
# 6  3  3     1


# a far simpler `data.table` approach
library(data.table)
count.dups <- function(DF){

  DT <- data.table(DF)
  DT[,.N, by = names(DT)]
}
count.dups(DF)
#    X1 X2 N
# 1:  1  1 3
# 2:  2  2 2
# 3:  3  3 1

your first solution is terrific at the same time terrifying every-time i think of function its nightmare — PesKchan, Aug 28 '21 at 22:38

HywelMJ · Answer 4 · 2014-09-17T19:42:52.203

12

Using dplyr:

summarise(group_by(df,a,b),length(b))

or

group_size(group_by(df,a,b))
#[1] 1 2 2 1 1 1

edited Sep 17 '14 at 19:42

answered Sep 16 '14 at 20:05

HywelMJ

332
2
7

3

dont forget about the pipe! df %>% group_by(a, b) %>% group_size() – Daniel Chen May 07 '15 at 17:32
1

Or `df %>% group_by_all() %>% count` – jtr13 Nov 06 '21 at 14:03

Find how many times duplicated rows repeat in R data frame

4 Answers4

Linked

Related