5

Let's say we have

library(data.table)    
dt <- data.table(Date = c(201405,201405,201504,201505, 201505,201505), ID = c(500,500,600,700,500, 700), INC = c(20,30,50,75,80,90))

return,

     Date  ID INC
1: 201405 500  20
2: 201405 500  30
3: 201504 600  50
4: 201505 700  75
5: 201505 500  80
6: 201505 700  90

I want to remove all IDs that are in the same Date. The return should be

     Date  ID INC
1: 201504 600  50
2: 201505 500  80

Could you please suggest?

newbie
  • 917
  • 8
  • 21

1 Answers1

7

We group by 'ID', get a logical index with duplicated on the 'Date', and negate so that all the unique elements are now TRUE, use .I to get the row index, extract the index column 'V1' and use that to subset 'dt'.

dt[dt[, .I[!(duplicated(Date)|duplicated(Date, fromLast=TRUE))], ID]$V1]
#      Date  ID INC
#1: 201505 500  80
#2: 201504 600  50

Or another option would be to group by 'Date', 'ID' and if the nrow is equal to 1 (.N==1), we get the Subset of Data.table (.SD).

dt[, if(.N==1) .SD, .(Date, ID)]
#     Date  ID INC
#1: 201504 600  50
#2: 201505 500  80

Or as @Frank mentioned, we can use a data.table/base R combo

DT[ave(seq(.N), Date, ID, FUN = function(x) length(x) == 1L)]
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    I had a similar one in `dt[dt[, !(duplicated(ID)|duplicated(ID,fromLast=TRUE)), by=Date]$V1]` – thelatemail Oct 21 '15 at 06:36
  • @thelatemail That also looks right. I usually go with the `.I` – akrun Oct 21 '15 at 06:39
  • 1
    I'd consider `DT[ave(seq(.N), Date, ID, FUN = function(x) length(x) == 1L)]`. Not very data.table-ish, but it doesn't require using `by` or scanning the vector with `duplicated` twice. – Frank Oct 21 '15 at 13:50
  • 1
    @Frank - it effectively does use `by` though, in that `ave` calls `lapply` + `split` internally. – thelatemail Oct 21 '15 at 23:04
  • @thelatemail On the basis of nothing whatsoever, I suspect splitting data.frames with `by` has more overhead that splitting a single vector... Oh -- just tested to see if I could make an example and my R session hung on the `ave` after doing the `if (...) .SD` in half a second, so... yep. – Frank Oct 21 '15 at 23:42