I recommend using data.table
for doing this (fread
in data.table will be quite handy in reading in the large data set too as you say you have enough RAM).
I am not sure that the following is the best way to do this in data.table but, at least, it should get you started. Hopefully, someone else will come along and list the most idiomatic data.table way for this. But this is what I can think of right now:
Assuming your data.table is called DT
and has two columns dmg
and O_Y
. Use O_Y
as the index key for DT
and subset DT
for O_Y == 1
(DT[.(1)]
in data.table syntax). Now find the corresponding dmg
values. The unique
of these dmg
values is your keys.with.ones
. All this is succinctly done as follows:
setkey(DT, O_Y)
keys.with.ones <- unique(DT[.(1), dmg][["dmg"]])
Next, we need to extract rows corresponding to these values of dmg
. For this we need to change the key for DT
to dmg
and extract the rows corresponding to the keys above:
setkey(DT, dmg)
DT.filtered <- DT[.(keys.with.ones)]
And we are done. :)
Please refer to ?data.table to figure out a better method if possible and let us know.