combinations of combinations in R

Question

Say I have two vectors

upVariables<-c("up1", "up2", "up3", "up4", "up5")
downVariables<-c("down1", "down2", "down3", "down4", "down5")

Each of these will be used to look up an number in another vectors. I'm looking to find all possible sets of two ratios (all possible sets of four variables, two from each vector), where the numerator is always from upVariables, the demnominator is always from downVariables and final set doesn't use the same variable twice.

I've got as far as

upCombos<-combn(upVariables,2)
downCombos<-combn(downVariables,2)
combos<-arrange(expand.grid(upCombos=upCombos[,1],downCombos=downCombos[,1]),upCombos)

I'm only using the first possible combination here, to illustrate, but I'd want to iterate over all possible combinations. This gives me:

> combos
  upCombos downCombos
1      up1      down1
2      up1      down2
3      up2      down1
4      up2      down2

What I'd like to produce from this though is two sets, something like:

> combos[1]
  upCombos downCombos
1      up1      down1
2      up2      down2

and

> combos[2]
  upCombos downCombos
1      up1      down2
2      up2      down1

So that in each case, each value from upCombos is used only once and each value from downCombos is used only once. Does that make sense? Any ideas on how one goes about this?

Ideally I'd like to then be able to generalize to sets of 3 sampled from the original vectors rather than sets of 2, but I'll be happy to get sets of 2 working for now.

** Edit So Jota has provided a solution which provides the arrangements within any group of 4 variables (2 from upVariables, 2 from downVariables). I'm still failing to see how I iterate over all possible sets of 4 variables though. The nearest I've got is to sit Jota's suggestion inside two for loops (spot the not-yet-R-programmer). This this returns many fewer combinations than there should be.

n<-2
offset<-n-1
for (i in 1:(length(upVariable)-offset)){
  for (j in 1:(length(downVariables)-offset)){
    combos <- expand.grid(upVariables[i:(i+offset)], downVariables[j:(j+offset)])
    combos <- combos[with(combos, order(Var1)), ]  # use dplyr::arrange if you prefer
    mat <- matrix(1:n^2, byrow = TRUE, nrow = n)
    for(j in 2:nrow(mat) ) mat[j, ] <- mat[j, c(j:ncol(mat), 1:(j - 1))]
      pairs<-(split(combos[c(mat), ], rep(1:n, each = n)))
     collapsed<-sapply(lapply(pairs, apply, 1, paste, collapse = '_'), paste, collapse = '-')
      ratioGroups<-c(ratioGroups,collapsed)
  }
}

This returns only 16 sets of variables (each with 2 combinations, so 32 in all). With 5 variables in each set though, there's many many more possibilities.

score 0 · Answer 1 · answered Nov 15 '16 at 03:54

You could use expand.grid to create combinations and prepare subsets with regular expressions

upVariables<-c("up1", "up2", "up3", "up4", "up5")
downVariables<-c("down1", "down2", "down3", "down4", "down5")

DF = expand.grid(upVariables,downVariables)

DF$suffix1 = as.numeric(unlist(regmatches(DF$Var1,gregexpr("[0-9]+",DF$Var1))))

DF$suffix2 = as.numeric(unlist(regmatches(DF$Var2,gregexpr("[0-9]+",DF$Var2))))

head(DF)
#  Var1  Var2 suffix1 suffix2
#1  up1 down1       1       1
#2  up2 down1       2       1
#3  up3 down1       3       1
#4  up4 down1       4       1
#5  up5 down1       5       1
#6  up1 down2       1       2



DF_Comb1 = DF[DF$suffix1==DF$suffix2,]
DF_Comb2 = DF[DF$suffix1!=DF$suffix2,]

DF_Comb1
#    Var1  Var2 suffix1 suffix2
# 1   up1 down1       1       1
# 7   up2 down2       2       2
# 13  up3 down3       3       3
# 19  up4 down4       4       4
# 25  up5 down5       5       5


head(DF_Comb2)
  # Var1  Var2 suffix1 suffix2
# 2  up2 down1       2       1
# 3  up3 down1       3       1
# 4  up4 down1       4       1
# 5  up5 down1       5       1
# 6  up1 down2       1       2
# 8  up3 down2       3       2

Jota · Answer 2 · 2016-12-03T00:40:26.893

Here's what I came up with in response to the comments and the edited question.

# create combos and order them according to the first variable
combos <- expand.grid(upVariables[1:2], downVariables[1:2])
combos <- combos[with(combos, order(Var1)), ]  # use dplyr::arrange if you prefer
# if names are important, set them:
# names(combos) <- c("upCombos", "downCombos")

# create a matrix to use to sort combos
mat <- matrix(1:2^2, byrow = TRUE, nrow = 2)
# take some code from Carl Witthoft to shift the above matrix
# from: http://stackoverflow.com/a/24144632/640595
for(j in 2:nrow(mat) ) mat[j, ] <- mat[j, c(j:ncol(mat), 1:(j - 1))]

# use the matrix to sort combos, and then conduct the splitting
initialResult <- split(combos[c(mat), ], rep(1:2, each = 2))

$`1`
  Var1  Var2
1  up1 down1
4  up2 down2

$`2`
  Var1  Var2
3  up1 down2
2  up2 down1

To generate the rest of the combinations, we can iterate through and replace the up variables and down variables:

# use regular expressions with the stringi package to produce the rest of the combinations.
library(stringi)
# convert from factor to character for easier manipulation
initialResult <- lapply(initialResult, sapply, as.character)

# iterate through the columns of upCombos
intermediateResult <- lapply(seq_len(dim(upCombos)[2]), 
    function(ii) {
        jj <- stri_replace_all_fixed(unlist(initialResult), 
            pattern = c("up1", "up2"), 
            replacement = c(upCombos[, ii]))
        relist(jj, initialResult)})

# iterate through columns of downCombos
finalResult <- lapply(seq_len(dim(downCombos)[2]), 
    function(ii) {
        jj <- stri_replace_all_fixed(unlist(intermediateResult), 
            pattern = c("down1", "down2"), 
            replacement = c(downCombos[, ii]), vectorize_all = FALSE)
        relist(jj, intermediateResult)})

So that works to get all the combinations for any given set of variables. How do I change that so that it iterates over all possible combinations of up1:up5 and down2:down5? my firs thought was to this inside two for loops, i.e. for (i in 1:(length(upVariables)-offset)){ etc. This seems a) not very R like - I'm sure there's a better way to do this and b) it doesn't seem to produce anywhere near as many combinations as I thought there would be. — Ben, Nov 17 '16 at 03:37
I was referring to the bit at the top of my question where I mentioned I had two sets of variables, upVariables and downVariables, and that I wanted to get all possible combinations of 4 variables (2 from each). I was using the first set of four to demonstrate what I wanted to do with each individual set. Your answer works brilliantly for arranging the variables with each set of 4. I thought I'd be able to expand it to iterate over all possible sets, but don't appear to have been able to. I'll go back and see if I can word the question better. — Ben, Nov 17 '16 at 03:54

score 0 · Accepted Answer · answered Nov 17 '16 at 06:17

So I think I may have cracked it. I've pillaged a couple of answers to other questions. There's a function here called expand.grid.unique which removes duplicates if you put the same vector into expand.grid twice. And there's one here, called expand.grid.df which I'm not even going to pretend to understand which expands expand.grid to work on dataframes. However, combined, they do what I want them to do.

upVariables<-c("up1", "up2", "up3", "up4", "up5")
downVariables<-c("down1", "down2", "down3", "down4", "down5")
ratioGroups<-data.frame(matrix(ncol=2, nrow=0))
colnames(ratioGroups)<-c("mix1","mix2")

ups<-expand.grid.unique(upVariables,upVariables)
downs<-expand.grid.unique(downVariables,downVariables)
comboList<-expand.grid.df(ups,downs)
comboList <- data.frame(lapply(comboList, as.character), stringsAsFactors=FALSE)
colnames(comboList)<-c("u1","u2","d1","d2")

There's a bunch of faffing about in there converting everything back to strings because everything gets converted to factors for some reason.

If I put Jota's answer into a function:

getGroups<-function(line){
  n<-2 #the number ratios being used.
  combos <- expand.grid(as.character(line[1:2]), as.character(line[3:4]))
  combos <- combos[with(combos, order(Var1)), ]  # use dplyr::arrange if you prefer
  mat <- matrix(1:n^2, byrow = TRUE, nrow = n)
  for(j in 2:nrow(mat) ) mat[j, ] <- mat[j, c(j:ncol(mat), 1:(j - 1))]
  pairs<-(split(combos[c(mat), ], rep(1:n, each = n)))
  collapsed<-sapply(lapply(pairs, apply, 1, paste, collapse = '_'), paste, collapse = '-')
}

I can then use

ratiosGroups<-as.vector(apply(comboList,1,getGroups))

to return a list of all possible combinations. I'm guessing this still isn't the best way to achieve my larger goal, but it's getting there.

combinations of combinations in R

3 Answers3