1

Following up on this question I posted some days ago, I would like to extend the same case for combinations of every length.

So I have a vector of the form:

markers <- LETTERS[1:5]

Originally I just wanted all possible combinations of conditions + and - for all the markers; i.e "lowest hierarchy level" of combinations of 5.

So applying the answer to the above question, I obtained the following:

 [1] "A+/B+/C+/D+/E+" "A-/B+/C+/D+/E+" "A+/B-/C+/D+/E+" "A-/B-/C+/D+/E+" "A+/B+/C-/D+/E+" "A-/B+/C-/D+/E+" "A+/B-/C-/D+/E+"
 [8] "A-/B-/C-/D+/E+" "A+/B+/C+/D-/E+" "A-/B+/C+/D-/E+" "A+/B-/C+/D-/E+" "A-/B-/C+/D-/E+" "A+/B+/C-/D-/E+" "A-/B+/C-/D-/E+"
[15] "A+/B-/C-/D-/E+" "A-/B-/C-/D-/E+" "A+/B+/C+/D+/E-" "A-/B+/C+/D+/E-" "A+/B-/C+/D+/E-" "A-/B-/C+/D+/E-" "A+/B+/C-/D+/E-"
[22] "A-/B+/C-/D+/E-" "A+/B-/C-/D+/E-" "A-/B-/C-/D+/E-" "A+/B+/C+/D-/E-" "A-/B+/C+/D-/E-" "A+/B-/C+/D-/E-" "A-/B-/C+/D-/E-"
[29] "A+/B+/C-/D-/E-" "A-/B+/C-/D-/E-" "A+/B-/C-/D-/E-" "A-/B-/C-/D-/E-"

Now I want to extend this to "upper hierarchy" levels of combinations of 1, 2, 3, and 4 markers. So I would get something like:

"A+"
"A-"
"B+"
"B-"
"C+"
"C-"
...
"A+/B+"
"A-/B+"
"A+/B-"
"A-/B-"
"B+/C+"
"B+/C-"
"B-/C+"
"B-/C-"
...
"A+/B+/C+"
"A-/B+/C+"
...
"A+/B+/C+/D+/E+"
"A-/B+/C+/D+/E+"
"A+/B-/C+/D+/E+"
"A-/B-/C+/D+/E+"
"A+/B+/C-/D+/E+"
...

What would be the fastest optimal way to build on top of the accepted answer to the previous question?

It doesn't have to be done in one shot, it would still be ok (or even better), to get the "inner nodes" from the previous results of groups of 5. Maybe working on the expand.grid intermediate result.

Any idea? Thanks!

EDIT

The best way for my intentions would be to actually keep a place holder for all the markers in the higher hierarchy combinations.

So for example in this case A+/D- would become A+/NA/NA/D-/NA

EDIT 2

Even the first answer to create all the possible n-size combinations (including NA) from scratch is really good... in my real world scenario I have the chance to retrieve a much smaller filtered list of the "lowest hierarchy level" combinations of 5 "markers" that I would be most interested in.

In this scenario, it would be really good to have the option to extract the "upper level nodes" of combinations of 1,2,3,4...n (with NA) from that filtered list (instead of generating all possible n-size combinations from scratch)...

Any idea?

DaniCee
  • 2,397
  • 6
  • 36
  • 59
  • This answers would probably be helpful: https://stackoverflow.com/questions/40049313/generate-all-combinations-of-all-lengths-in-r-from-a-vector It will give all combinations of letters of any size. Then you can just feed those sets of letters into the solution from the other question you asked. – MrFlick Oct 19 '20 at 04:31

2 Answers2

2

If you still wanted to keep the NA values in there, then just think of it as having a different value than "+" or "-", you just also have the NA value. You could do something like

markers <- LETTERS[1:5]

test <- expand.grid(lapply(seq(markers), function(x) c("+","-","NA")),stringsAsFactors=FALSE)

apply(test,1,function(x){paste0(ifelse(x=="NA", "NA", markers),ifelse(x=="NA","",x),collapse = "/")}) 
MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • ok this is smart, I think probably the best solution... let me test it out with my real data first – DaniCee Oct 19 '20 at 07:37
  • Ok this is not really working for my real life case, cause I can actually obtain from previous steps a pre-filtered list of "lowest hierarchy level" combinations of 5 "markers"... So I would really need to get the internal "nodes" of 1,2,3,4...n combinations out of those, instead of defining all from the beginning – DaniCee Oct 21 '20 at 08:57
  • Let me post an EDIT and see if anyone can help. Thanks! – DaniCee Oct 21 '20 at 08:58
  • I left this accepted answer and opened a new question so it is not so messy -> https://stackoverflow.com/questions/64474835/r-extract-inner-higher-level-combinations-groups-of-1-2-3-and-4-elements-o – DaniCee Oct 22 '20 at 03:28
2

Building off of my previous answer:

library(RcppAlgos)

plusMinusCombs <- function(n) {
    unlist(lapply(1:n, function(x) {
        comboGeneral(n, x, FUN = function(comb) {
            permuteGeneral(c("+", "-"), x, repetition = TRUE, FUN = function(y) {
                res <- rep(NA_character_, n)
                res[comb] <- paste0(LETTERS[comb], y)
                paste(res, collapse = "/")
            })
        })
    }))
}

Note, the above does not give the case with all NAs. For example with n <- 4, the above code will not give "NA/NA/NA/NA". If you really need this case, you can alter the code above so that is will append this result.

It is efficient as well (fun2 is the function provided by @MrFlick):

fun2 <- function(n) {
    markers <- LETTERS[1:n]
    test <- expand.grid(lapply(seq(markers), function(x) c("+","-","NA")),stringsAsFactors=FALSE)        
    apply(test,1,function(x){paste0(ifelse(x=="NA", "NA", markers),ifelse(x=="NA","",x),collapse = "/")}) 
}

library(microbenchmark)
microbenchmark(plusMinusCombs(6), fun2(6))
Unit: milliseconds
             expr       min        lq      mean    median        uq      max neval
plusMinusCombs(6)  6.094728  6.601576  8.513207  6.808835  7.146834 34.95683   100
          fun2(6) 12.909009 13.890408 18.250859 14.233292 18.461800 64.42103   100

I italicize "efficient" above to point out that as stated previously, doing this type of work with characters (especially the string manipulation via paste), no matter what method you choose, will perform terribly when compared to some equivalent numerical mapping.

For example, just removing the string characteristics above gets you an increase in efficiency by a factor of about 4. The below creates an object with a one-to-one mapping to the actual desired results, hence the suffix isomorphic:

isomorphicInteger <- function(n) {
    lapply(1:n, function(x) {
        comboGeneral(n, x, FUN = function(comb) {
            permuteGeneral(c(1L, -1L), x, repetition = TRUE, FUN = function(y) {
                res <- integer(n)
                res[comb] <- comb * y
                res
            })
        })
    })
}

microbenchmark(plusMinusCombs(6), fun2(6), isomorphicInteger(6))
Unit: milliseconds
                expr       min        lq      mean    median        uq       max neval
   plusMinusCombs(6)  6.348778  6.716615  9.433772  6.864476  7.194432 44.743407   100
             fun2(6) 13.927647 14.439699 17.775887 14.679669 18.480845 78.450272   100
isomorphicInteger(6)  1.662479  1.753788  2.290239  1.797843  1.897838  8.679857   100

There is still a lot of room for improvement. This was just to demonstrate that if you are really concerned about performance, you might want to rethink your approach. Many times in optimization problems, finding these types of mappings are crucial.

Disclaimer: I am the author of RcppAlgos

Joseph Wood
  • 7,077
  • 2
  • 30
  • 65
  • 1
    This is very interesting! I just implemented the base solution but yes takes a huge time... I probably need to optimize and follow this approach in the end. Thanks! – DaniCee Oct 20 '20 at 02:45