I have data that looks like this
v1 = sample(c("a","b"), 1000, replace = T)
v2 = sample(c("c","d"), 1000, replace = T)
X = cbind(v1, v2)
That is, two variables that can take two values each. The goal is to generate an index or something similar to subset this data into all possible subsets based on these two variables. The nine subsets can be described using the following conditions (in a slight abuse of notation):
#1# (a ) & (c )
#2# (a ) & ( d)
#3# ( b) & (c )
#4# ( b) & ( d)
#5# (a ) & (c | d)
#6# ( b) & (c | d)
#7# (a | b) & (c )
#8# (a | b) & ( d)
#9# (a | b) & (c | d)
That is, subset #1 should fulfill (var1 == "a") & (var2 == "c")
, subset #5 should fulfill var1 == "a"
, while subset #9 corresponds to the full data set, and so on.
This question is probably strongly related to this one and I suspect that what I want can be accomplished using combn()
. However, I could not figure out how to expand the answers therein to my problem with more than one variable.
It is definitely possible (and at the same time, extremely inelegant) to solve this specific problem using a hardcoded loop. However, the solution should generalize to more variables and a varying number of values for each variable. Hence, this becomes unfeasible quickly.
EDIT: Found an answer in another thread, flagged this one as duplicate.