finding all possible subsets of a dataframe

Question

I am looking for a function that takes a column of a data.frame as the reference and finds all subsets with respect to the other variable levels. For example, let z be data frame with 4 columns a,b,c,d, each column has 2 levels for instance. let a be the reference. Then z would be like

z$a : TRUE FALSE
z$b : TRUE FALSE
z$c : TRUE FALSE
z$d : TRUE FALSE

Then what I need is a LIST that the elements are combination names such as

aTRUEbTRUEcTRUEdTR UE :subset of the dataframe 
aTRUEbFALSEcTRUEdTRUE : subset
...

Here is an example,

set.seed(123)
z=matrix(sample(c(TRUE,FALSE),size = 100,replace = TRUE),ncol=4)
colnames(z) = letters[1:4]
z=as.data.frame(z)


output= list(
    'bTUEcTRUEdFALSE' = subset(z,b==TRUE & c==TRUE & d==FALSE),
    'bTRUEcTRUEdTRUE' = subset(z,b==TRUE & c==TRUE & d==TRUE),
    'bTRUEcFALSEdFALSE' = subset(z,b==TRUE & c==FALSE & d==FALSE),
    'bTRUEcFALSEdTRUE' = subset(z,b==TRUE & c==FALSE & d==TRUE)
    # and so on ...
)
output
$bTUEcTRUEdFALSE
       a    b    c     d
13 FALSE TRUE TRUE FALSE
14 FALSE TRUE TRUE FALSE

$bTRUEcTRUEdTRUE
       a    b    c    d
4  FALSE TRUE TRUE TRUE
10  TRUE TRUE TRUE TRUE
16 FALSE TRUE TRUE TRUE
20 FALSE TRUE TRUE TRUE
24 FALSE TRUE TRUE TRUE

$bTRUEcFALSEdFALSE
       a    b     c     d
17  TRUE TRUE FALSE FALSE
19  TRUE TRUE FALSE FALSE
22 FALSE TRUE FALSE FALSE

$bTRUEcFALSEdTRUE
       a    b     c    d
5  FALSE TRUE FALSE TRUE
11 FALSE TRUE FALSE TRUE
15  TRUE TRUE FALSE TRUE
18  TRUE TRUE FALSE TRUE
21 FALSE TRUE FALSE TRUE
23 FALSE TRUE FALSE TRUE

However, there is an issue with the example. firstly, I do not know the number of variables (in this case 4 (a to d). Secondly, the name of the variables must be caught from the data (simple speaking, I cannot use subset since I do not know the variable name in the condition (a== can be anything==)

What is the most efficient way of doing this in R?

Unclear. Please provide a [reproducible example](http://stackoverflow.com/questions/5963269) and expected output — Sotos, Jan 10 '18 at 14:54

score 1 · Accepted Answer · answered Jan 10 '18 at 17:40

You can use split and paste like so:

split(z, paste(z$b, z$c, z$d))

But the tricky part of your question is how to programmatically combine the variables in columns 2:end without knowing beforehand the number of columns, their names or values. We can use a function like below to paste the values by row in columns 2:end

apply(df, 1, function(i) paste(i[-1], collapse=""))

Now combine with split

split(z, apply(z, 1, function(i) paste(i[-1], collapse="")))

finding all possible subsets of a dataframe

1 Answers1