0

I have a datatable with a column foo. I'd like to get all the rows which have a duplicate in the column foo.

I thought dt[duplicated(dt$foo),] was supposed to do that, but for each value in foo that has duplicates, it doesn't return the first row, only the other following rows that have a duplicate.

I don't know if I'm clear, so here is an example :

> dt <- data.table(id = c(1,2,3,4,5,6,7,8,9), foo = c("a","b","b","b","c","c","d","e","e"))
> print(dt) 
   id foo
1:  1   a
2:  2   b
3:  3   b
4:  4   b
5:  5   c
6:  6   c
7:  7   d
8:  8   e
9:  9   e

> dt[duplicated(dt$foo),]
   id foo
1:  3   b
2:  4   b
3:  6   c
4:  9   e

Where I would like :

   id foo
2:  2   b
3:  3   b
4:  4   b
5:  5   c
6:  6   c
8:  8   e
9:  9   e

How can I get all the rows ?

Thanks.

EDIT : OK I found out this dt[foo %in% dt[duplicated(dt$foo),]$foo], which seems to work (and makes sense). But is it the simplest way to do this ??

François M.
  • 4,027
  • 11
  • 30
  • 81
  • 1
    You can look up `dplyr` based solutions and there should be many answers. One option is `dt %>% group_by(foo) %>% filter(n() > 1)` – Gopala Jun 01 '16 at 14:19
  • 1
    `dt[dt[, .I[.N > 1L], foo]$V1]` – rawr Jun 01 '16 at 14:37
  • http://stackoverflow.com/questions/24503279/return-df-with-a-columns-values-that-occur-more-then-once seems like a more relevant dup – rawr Jun 01 '16 at 14:41

0 Answers0