I am looking for a function that takes a column of a data.frame as the reference and finds all subsets with respect to the other variable levels. For example, let z
be data frame with 4 columns a,b,c,d
, each column has 2 levels for instance. let a
be the reference. Then z would be like
z$a : TRUE FALSE
z$b : TRUE FALSE
z$c : TRUE FALSE
z$d : TRUE FALSE
Then what I need is a LIST that the elements are combination names such as
aTRUEbTRUEcTRUEdTR UE :subset of the dataframe
aTRUEbFALSEcTRUEdTRUE : subset
...
Here is an example,
set.seed(123)
z=matrix(sample(c(TRUE,FALSE),size = 100,replace = TRUE),ncol=4)
colnames(z) = letters[1:4]
z=as.data.frame(z)
output= list(
'bTUEcTRUEdFALSE' = subset(z,b==TRUE & c==TRUE & d==FALSE),
'bTRUEcTRUEdTRUE' = subset(z,b==TRUE & c==TRUE & d==TRUE),
'bTRUEcFALSEdFALSE' = subset(z,b==TRUE & c==FALSE & d==FALSE),
'bTRUEcFALSEdTRUE' = subset(z,b==TRUE & c==FALSE & d==TRUE)
# and so on ...
)
output
$bTUEcTRUEdFALSE
a b c d
13 FALSE TRUE TRUE FALSE
14 FALSE TRUE TRUE FALSE
$bTRUEcTRUEdTRUE
a b c d
4 FALSE TRUE TRUE TRUE
10 TRUE TRUE TRUE TRUE
16 FALSE TRUE TRUE TRUE
20 FALSE TRUE TRUE TRUE
24 FALSE TRUE TRUE TRUE
$bTRUEcFALSEdFALSE
a b c d
17 TRUE TRUE FALSE FALSE
19 TRUE TRUE FALSE FALSE
22 FALSE TRUE FALSE FALSE
$bTRUEcFALSEdTRUE
a b c d
5 FALSE TRUE FALSE TRUE
11 FALSE TRUE FALSE TRUE
15 TRUE TRUE FALSE TRUE
18 TRUE TRUE FALSE TRUE
21 FALSE TRUE FALSE TRUE
23 FALSE TRUE FALSE TRUE
However, there is an issue with the example. firstly, I do not know the number of variables (in this case 4 (a to d). Secondly, the name of the variables must be caught from the data (simple speaking, I cannot use subset since I do not know the variable name in the condition (a== can be anything==)
What is the most efficient way of doing this in R?