I am wondering in data.table
what is the most efficient or cleanest way to select rows based on the occurrence of some column values.
For example, in a 7 column data table, with each value being either 1 or 0, I want all rows where there are exactly 2 values of 1 and 5 values of 0 (1 representing "presence" and 0 "absence").
So far, here is what I am doing, assuming the following data.table
(much bigger, here is just a sample of it)
name D2A1.var D2B3.var D3A1.var D4A3.var D5B3.var H2A3.var H4A4.var MA_ancestor.var
Chrom_1;10000034;G;A Chrom_1;10000034;G;A 1 1 1 1 1 1 1 1
Chrom_1;10000035;G;A Chrom_1;10000035;G;A 1 1 1 1 1 1 1 1
Chrom_1;10000042;C;A Chrom_1;10000042;C;A 1 1 1 1 1 1 1 1
Chrom_1;10000051;A;G Chrom_1;10000051;A;G 1 1 1 1 1 1 1 1
Chrom_1;10000070;G;A Chrom_1;10000070;G;A 1 1 1 1 1 1 1 1
Chrom_1;10000084;C;T Chrom_1;10000084;C;T 1 1 1 1 1 1 1 1
Chrom_6;9997224;AT;A Chrom_6;9997224;AT;A 0 0 0 0 0 1 0 1
Chrom_6;9998654;GTGTGTGTT;G Chrom_6;9998654;GTGTGTGTT;G 0 0 0 0 0 0 0 1
Chrom_6;9999553;TTTC;T Chrom_6;9999553;TTTC;T 0 0 0 0 0 0 0 1
and if I want all rows where I have 7 1 and let's say only 1 in D2A1.var and D3A1.var I am doing the following
ALL = DT[DT$MA_ancestor.var == 1 & DT$D2A1.var == 1 &DT$D2B3.var == 1 & DT$D3A1.var == 1 & DT$D4A3.var == 1 &DT$D5B3.var == 1 & DT$H2A3.var == 1 & DT$H4A4.var == 1,]
TWO = DT[DT$MA_ancestor.var == 0 & DT$D2A1.var == 1 &DT$D2B3.var == 0 & DT$D3A1.var == 1 & DT$D4A3.var == 0 &DT$D5B3.var == 0 & DT$H2A3.var == 0 & DT$H4A4.var == 0,]
DFlist=list(TWO, ALL)
DFlong = rbindlist(DFlist, use.names = TRUE, idcol = FALSE)
This returns the expected result and is fast enough. However when having multiple conditions it's a lot of typing and a lot of data.table
creations. Is there a faster, cleaner and more compact way of achieving this?