R script to generate all combinatorics of two identical lists including incomplete lists

Question

I think this problem can be solved in many different ways, but I basically want to find a function that will give me a dataframe with every combination of values from a list into its columns, including the incomplete sets and excluding some, but not all, redundant combinations (order isn't important for now).

So I might start out with a list like this:

 List = c("A","B","C")

and I want to get a dataframe that looks like

C1 = c("A","B","C","A","A","B","A")
C2 = c("","","","B","C","C","B")
C3 = c("","","","","","","C")
df <- cbind(C1, C2, C3) 
row.names(df) <-  c("A", "B", "C", "AB", "AC", "BC", "ABC")
colnames(df) <- c("First_Item", "Second_Item","Third_Item")

And then it fills in each cell with the corresponding letter. e.g. position A1 in the df would be "A", positions A2 and A3 would be empty.

any idea how to do this?

I tried with dplyr:

library(tidyr)
list_1 = c("A", "B", "C", "NA")
list_2 = c("A", "B", "C", "NA")
list_3 = c("A", "B", "C", "NA")
list_4 = c("A", "B", "C", "NA")
test <- crossing(list_1, list_2,list_3,list_4)
test <- test[apply(test, MARGIN =  1, FUN = function(x) !(duplicated(x) | !any = "NA")),]

But I want to keep all the values with multiple NAs in them, so this doesn't quite work.

expand.grid has the same problem

expand.grid(list_1 = c("A", "B", "C", "NA"),list_2 = c("A", "B", "C", "NA"),list_3 = c("A", "B", "C", "NA"),list_4 = c("A", "B", "C", "NA"))

`?combn` + `?paste0` + `for (i in seq_along(List))` ? I don't understand your output example: there are two vectors there with different lengths. Can you be more specific? (What happens if you replace `"NA"` with `""` in your examples?) — Ben Bolker, Jun 09 '21 at 23:17
Hi @BenBolker, ok I fixed the code - does that make more sense now? — Alison R., Jun 09 '21 at 23:28

score 3 · Answer 1 · edited Nov 08 '21 at 02:10

3

That's basically Roland's answer:

library(magrittr) # just for the pipe-operator

List %>%
  seq_along() %>%
  lapply(combn, x = List, simplify = FALSE) %>%
  unlist(recursive = FALSE) %>%
  sapply(`length<-`, length(List)) %>%
  t() %>%
  data.frame()

returns

  X1   X2   X3
1  A <NA> <NA>
2  B <NA> <NA>
3  C <NA> <NA>
4  A    B <NA>
5  A    C <NA>
6  B    C <NA>
7  A    B    C

Further more you could use the dplyr and tidyr packages to replace NAs. Just add one more function into the pipe:

mutate(across(everything(), replace_na, ""))

edited Nov 08 '21 at 02:10

Nimantha

6,405
6
28
69

answered Jun 09 '21 at 23:32

Martin Gal

16,640
5
21
39

1

Interestingly enough, this is an occasion when the base pipe `|>` gives the same result, if you want to remove the *magrittr* dependency – thelatemail Jun 10 '21 at 00:39
@thelatemail Didn't know base R already has a pipe operator. Awesome! – Martin Gal Jun 10 '21 at 07:28

ktiu · Answer 2 · 2021-06-10T00:24:05.263

2

Here is my approach:

library(purrr)

List <- c("xA","xB","xC") # arbitrary as per request in comments

seq_along(List) %>% # h/t @MartinGal
  map(~ combn(List, m = .x) %>%
          apply(2, paste, collapse = "<!>")) %>%
  unlist() %>%
  tibble::tibble() %>%
  tidyr::separate(1, into = c("First_Item", "Second_Item", "Third_Item"),
                  sep = "<!>")

Returns:

# A tibble: 7 x 3
  First_Item Second_Item Third_Item
  <chr>      <chr>       <chr>
1 xA         NA          NA
2 xB         NA          NA
3 xC         NA          NA
4 xA         xB          NA
5 xA         xC          NA
6 xB         xC          NA
7 xA         xB          xC

edited Jun 10 '21 at 00:24

answered Jun 09 '21 at 23:36

ktiu

2,606
6
20

2

Please add the packages you used. And instead of `1:length(List)` you can use `seq_along(List)`, see [Advanced R](https://adv-r.hadley.nz/control-flow.html#common-pitfalls). – Martin Gal Jun 09 '21 at 23:40
1

This is promising but is there a way for this to work for longer strings? e.g instead of A, B, C something like ItemA, ItemB, ItemC? – Alison R. Jun 09 '21 at 23:46
2

Updated my answer! Thanks @MartinGal for the pointer! – ktiu Jun 10 '21 at 00:18

R script to generate all combinatorics of two identical lists including incomplete lists

2 Answers2