Building on my previous question, I have the following data.table
:
> dt = data.table(V1 = c(1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 4),
V2 = c(4, 4, 4, 5, 5, 6, 6, 7, 7, 4, 5),
ID = c(1, 2, 3, 1, 2, 2, 2, 1, 3, 1, 1))
V1 V2 ID
1: 1 4 1
2: 1 4 2
3: 1 4 3
4: 2 5 1
5: 2 5 2
6: 2 6 2
7: 2 6 2
8: 2 7 1
9: 2 7 3
10: 3 4 1
11: 4 5 1
with many duplicate rows (when looking at V1
and V2
).
I want to remove all the "entirely duplicate" rows (those where V2
is always the same for a given V1
), including the ones with a unique V1
(since all one V2
is equal to itself).
I was given many solutions in the previous question linked above, including @akrun's (modified for this new table):
> newDT = unique(dt[dt[, .(i1 = .I[uniqueN(V2) > 1]), V1]$i1], by = c("V1", "V2"))
V1 V2 ID
1: 2 5 1
2: 2 6 2
3: 2 7 3
However, I actually need to store a list of ID
s which had a given result. I can collect the list of ID
s for a given pair of (V1, V2)
trivially with
> unique(dt[V1 == 2 & V2 == 5, ID])
[1] 1 2
(unique
is necessary because a given (V1,V2)
pair may also have duplicate ID
, see dt
rows 6:7) and add it with
newDT[, ID := .(.(unique(dt[V1 == 2 & V2 == 5, ID])))]
but I can't figure out how to "iterate" over each (V1,V2)
pair.
In summary, I need the output to be
> newDT
V1 V2 ID
1: 2 5 (1, 2)
2: 2 6 (2)
3: 2 7 (1, 3)