Combination of two lists with partial string matching (in R)

Question

I am trying to find all the combinations of two lists, however the second list is essentially repetition of the first lists variables with added brackets etc., as shown below.

other_cols <- c("C", "D", "E", "F")

other_colsRnd <- c("(1|C)", "(1|D)", "(1|E)", "(1|F)")

# I have some code to do combinations from one list:

combos = do.call(c, lapply(seq_along(other_cols), function(y) {
  arrangements::combinations(other_cols, y, layout = "l")
}))

theBigList = sapply(combos, paste, collapse = " + ")

> theBigList
 [1] "C"             "D"             "E"             "F"             "C + D"         "C + E"         "C + F"         "D + E"         "D + F"        
[10] "E + F"         "C + D + E"     "C + D + F"     "C + E + F"     "D + E + F"     "C + D + E + F"

I would like the full list of combinations in theBigList of both of them combined, without any repetition of C and (1|C)

########

edit C or D etc. are shorthand versions of the "real" variables, which look more like:

other_cols <- c("Charlie", "Delta", "Echo", "Foxtrot")
other_colsRnd <- c("(1|Charlie)", "(1|Delta)", "(1|Echo)", "(1|Foxtrot)")

########

The expected outcome is something like this, though stored order will not be important.

theBigList
"C"  "(1|C)"  "D"  "(1|D)"  "E"  "(1|E)"  "F"  "(1|F)"  "C + D"
"C + (1|D)"  "C + E"  "C + (1|E)"  "C + F"  "C + (1|F)"
"D + E"  "D + (1|E)"  "D + F"  "D + (1|F)"      
"E + F"  "E + (1|F)"
"C + D + E"  "(1|C) + D + E"  "(1|C) + (1|D) + E"  "(1|C) + (1|D) + (1|E)" etc.

Is there a way to put the lapply inside the lapply?

Or, I am currently thinking I can comboRnd e.g

combosRnd = do.call(c, lapply(seq_along(other_cols), function(y) {
  arrangements::combinations(other_colsRnd, y, layout = "l")
}))

and then take inspiration from here using var_comb <- expand.grid(combos, combosRnd) with some sort of if and grep to detect the "same" variables, that I haven't worked out yet.

edit

I think I think, I can add combos e.g. something like

theBigList = sapply(combos, paste, collapse = " + ")
theBigListRnd = sapply(combosRnd, paste, collapse = " + ")
comboBigList = c(theBigList, theBigListRnd)
var_comb <- expand.grid(combos, combosRnd)
var_comb2 <- expand.grid(theBigList, theBigListRnd)

... so comboBigList has all the ones where there is no crossover whatsoever, and then I can remove any "lines" in either or var_comb or var_comb2 that have that have matching anything matching in the var columns.

Yes, this is a smaller easier chunk of my previously asked question here, however I have refined it to the bare necessity for me to get this infernal analysis done, as it seems that I may have been biting off more than I can chew on that one. I will brute force the nestings I need with this as a supplement (hopefully).

Ronak Shah · Accepted Answer · 2021-08-05T04:44:49.340

1

Why not combine other_cols and other_colsRnd and use the same code that you have.

combine_vec <- c(other_cols, other_colsRnd)

combos <- do.call(c, lapply(seq_along(combine_vec), function(y) {
  arrangements::combinations(combine_vec, y, layout = "l")
}))

theBigList = sapply(combos, paste, collapse = " + ")
theBigList

#  [1] "C"                                            
#  [2] "D"                                            
#  [3] "E"                                            
#  [4] "F"                                            
#  [5] "(1|C)"                                        
#  [6] "(1|D)"                                        
#  [7] "(1|E)"                                        
#  [8] "(1|F)"                                        
#  [9] "C + D"                                        
# [10] "C + E"                                        
# [11] "C + F"                                        
# [12] "C + (1|C)" 
#...
#...

From this theBigList you can drop the variable + (1|variable) combination using the following code.

library(stringr)

finalList <- theBigList[!mapply(function(x, y) any(x %in% y) || any(y %in% x), 
    str_extract_all(theBigList, '\\b[A-Z](?!\\))'), 
    str_extract_all(theBigList, '(?<=1\\|)[A-Z]'))]

edited Aug 05 '21 at 04:44

answered Aug 04 '21 at 12:45

Ronak Shah

377,200
20
156
213

That works for the initial combination but I can't have a line like "C + (1|C)" in the final list. If there is a list like your `theBigList` (or in the `expand.grid` idea I had before `var_comb`), the some pseudocode might say: loop through and check for multiples of the variable "C". i.e. `If C & (1|C) are in the same line delete it.` Though it would loop through `other_cols` to check for all the ones to delete. – MrSwaggins Aug 05 '21 at 00:23
So you don't want any combinations of `variable + (1|variable)` in the output? So also drop values like `"(1|C) + (1|D) + D"` ? – Ronak Shah Aug 05 '21 at 02:23
yes, that is correct. My latest attempt doesn't work yet and also seems inelegant: `newBigList <- NULL for (l in combos3){ for (m in other_cols){ if (str_detect(m, paste("(1|", m, ")", sep = "")) == FALSE){ newBigList <- c(combos3[l], newBigList) } } }` – MrSwaggins Aug 05 '21 at 02:52
@MrSwaggins See my updated answer. That might help. – Ronak Shah Aug 05 '21 at 04:45
Almost. Hugely sorry, I should have said that I am only using `C`, `D` etc. as shorthand place holders for the variables. The real variables are more like `other_cols <- c("Charlie", "Delta", "Echo", "Foxtrot") other_colsRnd <- c("(1|Charlie)", "(1|Delta)", "(1|Echo)", "(1|Foxtrot)")` – MrSwaggins Aug 05 '21 at 05:07
1

You may change the regex based on your actual data. So instead of `[A-Z]` you may use `\\w+` to extract one word instead of one letter. – Ronak Shah Aug 05 '21 at 05:10

Combination of two lists with partial string matching (in R)

edit

1 Answers1