-1

I need help to develop a code to create all dataset combinations from a data frame in R.

E.g. dataframe =

        | A  B  C |
        | 1  4  7 |
        | 2  5  8 |
        | 3  6  9 |

Dataset combinations: A, B, C, AB, AC, BC, ABC

data1 =

    | A |
    | 1 |
    | 2 |
    | 3 |

data2 =

    | B |
    | 4 |
    | 5 |
    | 6 |

data3 =

    | C |
    | 7 |
    | 8 |
    | 9 |

data4 =

    | A  B | 
    | 1  4 |
    | 2  5 |
    | 3  6 |

data5 =

    | A  C |
    | 1  7 |
    | 2  8 |
    | 3  9 |

data6 =

    | B  C |
    | 4  7 |
    | 5  8 |
    | 6  9 |

data7 = | A

    | A  B  C |
    | 1  4  7 |
    | 2  5  8 |
    | 3  6  9 |

Kind regards.

  • This is called the power set in mathematics. Here is a [related post](http://stackoverflow.com/questions/18715580/algorithm-to-calculate-power-set-all-possible-subsets-of-a-set-in-r). To make this work for your example, use the resulting sets to select subsets of the data frame in a loop or `lapply`. – lmo Oct 19 '16 at 17:51

1 Answers1

0

Here is a solution using lapply and combn (to get variable name combinations)

colNameSet <- unlist(lapply(seq_len(length(df)),
                           function(i) combn(names(df), i, simplify=FALSE)), recursive=FALSE)

myList <- lapply(colNameSet, function(x) df[x])

The vector of 1...k where k is the number of variables is fed to lapply which tells combn the size of variable combinations to make in each iteration. The result of each combn call is a list because of the simplify=FALSE statement. Thus the result of lapply is a nested list. unlist with recursive=FALSE flattens the list.

The second line runs through this list of variable names and subsets the data frame based on the contents of each element.

Which returns

 myList
[[1]]
  A
1 1
2 2
3 3

[[2]]
  B
1 4
2 5
3 6
...

[[6]]
  B C
1 4 7
2 5 8
3 6 9

[[7]]
  A B C
1 1 4 7
2 2 5 8
3 3 6 9

data

df <- data.frame(matrix(1:9, 3))
names(df) <- LETTERS[1:3]
lmo
  • 37,904
  • 9
  • 56
  • 69