1

I am trying to build a vector of strings as an input for model testing (it eventually goes into the lmer function). I Have to change around the columns a lot for different tests, so doing this at the start by declaring them in one place would really speed up the process.

The vector (of strings) is made up of column headings (from the data). There are currently two fixed starting points, and then I would like to iterate through the available column options without repetition and where order is not important.

Example input:

first_col <- "SpA"
secondFixedcol <- "SpecB"
other_cols <- c("C", "D", "E", "F") #This can have any number of parameters

Example output for model text:

modelsText <- c('SpecB',
                'SpA + SpecB',
                'SpA + SpecB + C',
                'SpA + SpecB + D',
                'SpA + SpecB + E',
                'SpA + SpecB + F',
                    'SpA + SpecB',
                    'SpA + SpecB + C + D',
                    'SpA + SpecB + C + E',
                    'SpA + SpecB + C + F',
                        'SpA + SpecB + C + D + E',
                        'SpA + SpecB + C + D + F',
                            'SpA + SpecB + C + D + E + F')

My mind is trying to tell me to build some sort of frankenstein For Loop using the Paste function (that is still beyond me at this stage), but there must be something more elegant using vectorisation?

My other idea is to use combinations(4, 3, other_cols, repeats.allowed = FALSE)

and then use a nested For Loop to move through that?

MrSwaggins
  • 87
  • 8
  • i'm not sure if this is what you wanted, but you could check https://stackoverflow.com/questions/40049313/generate-all-combinations-of-all-lengths-in-r-from-a-vector – sammy Nov 30 '20 at 05:42

1 Answers1

3

Taking inspiration from this answer,

combos = do.call(c, lapply(seq_along(other_cols), function(y) {
  arrangements::combinations(other_cols, y, layout = "l")
}))

formulas = sapply(combos, paste, collapse = " + ")

formulas = paste(first_col, secondFixedcol, formulas, sep = " + ")
formulas
#  [1] "SpA + SpecB + C"             "SpA + SpecB + D"             "SpA + SpecB + E"            
#  [4] "SpA + SpecB + F"             "SpA + SpecB + C + D"         "SpA + SpecB + C + E"        
#  [7] "SpA + SpecB + C + F"         "SpA + SpecB + D + E"         "SpA + SpecB + D + F"        
# [10] "SpA + SpecB + E + F"         "SpA + SpecB + C + D + E"     "SpA + SpecB + C + D + F"    
# [13] "SpA + SpecB + C + E + F"     "SpA + SpecB + D + E + F"     "SpA + SpecB + C + D + E + F"

I'll leave it to you to add the formulas that don't involve any of the other_cols - just tacking them on the front with c() should be fine.

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • Very nice. I will edit the original question to make it clearer, as I meant a vector of strings, but you clearly understood what I meant as the answer is working well. – MrSwaggins Nov 30 '20 at 07:37
  • I'm confused by this comment - this does result in a vector of strings, almost identical to the one showed in your question. Just missing the first couple that you show, `'SpecB', 'SpA + SpecB'` since I only constructed the ones involving `other_cols`. – Gregor Thomas Nov 30 '20 at 13:22
  • 1
    Sorry, I still struggle somewhat with vectorisation (as opposed to using traditional For Loops), my original question just asked for a "string", presumably including all the comma's etc. Your answer addressed what I meant. I just tried to clear up the original question in case anyone else was looking for the answer and might have gotten confused. I was able to add the other components and it is working perfectly. – MrSwaggins Dec 01 '20 at 00:28