Getting rows where there are multiple of the same value

Question

I have a R dataframe that looks something like this:

A    B          C
14   apple      45
14   bannaa     23
15   car        234
16   door       12
16   ear        325

As you can see, 14 and 16 are repeated. I want:

A    B          C
14   apple      45
14   bannaa     23
16   door       12
16   ear        325

So far I have table(DF$A) > 1, but how to/whats the easiest way to do what I want?

David Arenburg · Accepted Answer · 2015-08-18T08:02:46.283

2

Here's another possible base R solution

indx <- with(df, ave(A, A, FUN = length))
df[indx > 1, ]
#    A      B   C
# 1 14  apple  45
# 2 14 bannaa  23
# 4 16   door  12
# 5 16    ear 325

Or using data.table package

library(data.table)
setDT(df)[, .SD[.N > 1], by = A]
#     A      B   C
# 1: 14  apple  45
# 2: 14 bannaa  23
# 3: 16   door  12
# 4: 16    ear 325

or

setDT(df)[, if(.N > 1) .SD, by = A]

Finally, a bonus solution using rle

## df <- df[order(df$A), ] # If the data isn't sorted by `A`, you''ll need to sort it first
indx <- rle(df$A)$lengths 
df[rep(indx > 1, indx), ]
#    A      B   C
# 1 14  apple  45
# 2 14 bannaa  23
# 4 16   door  12
# 5 16    ear 325

edited Aug 18 '15 at 08:02

answered Oct 30 '14 at 19:47

David Arenburg

91,361
17
137
196

Hm, `.SD[.N > .]` seems quite common, isn't it? Time to optimise that then. – Arun Nov 03 '14 at 23:19
@Arun, I'm in (also see [here](http://stackoverflow.com/questions/26703764/find-duplicated-rows-with-original/26704121?s=1|0.0000#26704121)) send me an email with instructions :) – David Arenburg Nov 04 '14 at 09:14

akrun · Answer 2 · 2014-10-30T18:50:28.000

1

indx <- duplicated(df[,"A"])|duplicated(df[,"A"],fromLast=TRUE)
df[indx,]
#   A      B   C
#1 14  apple  45
#2 14 bannaa  23
#4 16   door  12
#5 16    ear 325

edited Oct 30 '14 at 18:50

answered Oct 30 '14 at 18:43

akrun

874,273
37
540
662

what does fromLast do? Can you do it without the [,1] and just call the column names? – SuperString Oct 30 '14 at 18:47
@SuperString `fromLast` means the duplication is considered from the reverse direction. When you do `duplicated(df[,"A"])` only the elements that are duplicated will be TRUE ie. it won't take the first value. By reversing, we get both the `first` and all others. – akrun Oct 30 '14 at 18:53

score 1 · Answer 3 · answered Oct 30 '14 at 19:37

Since you already started with a different approach, here's how you could complete it:

x <- table(df$A)
df[df$A %in% names(x[x>1]),]
#   A      B   C
#1 14  apple  45
#2 14 bannaa  23
#4 16   door  12
#5 16    ear 325

This uses the fact that names(x) gives you the unique values of column A which you can subset to all those values which occur more than once by using names(x[x>1]).

And another option, in case you're already familiar with dplyr, would be:

require(dplyr)
df %>% group_by(A) %>% filter(n() > 1)

Getting rows where there are multiple of the same value

3 Answers3

Linked

Related