I'm trying to figure out the best way how to use multiple variable filters in R.
Usually have up to 100 variables (in one condition) and need to filter cases where ANY of these variables satisfies the same condition (e.g. VARx=170). The names and numbers of variables often differ and are entered as a string to be evaluated. This is a bottleneck of my whole computation.
Example (filter Varx=37):
id <- c(1:100000)
x1 <- sample(1:100, 100000, replace=T)
x2 <- sample(1:100, 100000, replace=T)
x3 <- sample(1:100, 100000, replace=T)
x4 <- sample(1:100, 100000, replace=T)
x5 <- sample(1:100, 100000, replace=T)
x6 <- sample(1:100, 100000, replace=T)
x7 <- sample(1:100, 100000, replace=T)
x8 <- sample(1:100, 100000, replace=T)
x9 <- sample(1:100, 100000, replace=T)
x10 <- sample(1:100, 100000, replace=T)
df<-data.frame(id,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10)
dt<-data.table(df)
pm<-proc.time()
vys<-((x1==37) | (x2==37) | (x3==37) | (x4==37) | (x5==37) | (x6==37) | (x7==37) | (x8==37) | (x9==37) | (x10==37))
proc.time() - pm
pm<-proc.time()
vys<-((rowSums(subset(df,select=c(x1:x10))==37)>0))
proc.time() - pm
The first statement needs less time but is more difficult to prepare and longer. The second slower, yet more concise. I have tried to incorporate data.table in my computation but without success (i.e. without getting better computation times).
Do I miss a better way how to do this filtering?
(Changing the data structure or coding of the variables might be, of course, a solution. Still I would like to examine this kind of multiple filtering).