0

I am trying to calculate the combinations of elements of a matrix but each element should appear only once.

The (real) matrix is symmetric, and can have more then 5 elements (up to ~2000):

o <- matrix(runif(25), ncol = 5, nrow = 5)
dimnames(o) <- list(LETTERS[1:5], LETTERS[1:5])

#           A         B            C         D         E
# A 0.4400317 0.1715681 0.7319108946 0.3994685 0.4466997
# B 0.5190471 0.1666164 0.3430245044 0.3837903 0.9322599
# C 0.3249180 0.6122229 0.6312876740 0.8017402 0.0141673
# D 0.1641411 0.1581701 0.0001703419 0.7379847 0.8347536
# E 0.4853255 0.5865909 0.6096330935 0.8749807 0.7230507

I desire to calculate the product of all the combinations of pairs (If possible it should appear all elements:AB, CD, EF if the matrix is of 6 elements), where for each pair one letter is the column, the other one is the row. Here are some combinations:

AB, CD, E  
AC, BD, E  
AD, BC, E  
AE, BC, D  
AE, BD, C  

Where the value of the single element is just 1.

Combinations not desired:

AB, BC: Element B appears twice  
AB, AC: Element A appears twice

Things I tried:

I thought about removing the unwanted part of the matrix:

out <- which(upper.tri(o), arr.ind = TRUE)
out <- cbind.data.frame(out, value = o[upper.tri(o)])

out[, 1] <- colnames(o)[out[, 1]]
out[, 2] <- colnames(o)[out[, 2]]
#    row col     value
# 1    A   B 0.1715681
# 2    A   C 0.7319109
# 3    B   C 0.3430245
# 4    A   D 0.3994685
# 5    B   D 0.3837903
# 6    C   D 0.8017402
# 7    A   E 0.4466997
# 8    B   E 0.9322599
# 9    C   E 0.0141673
# 10   D   E 0.8347536

My attempt involves the following process:

  1. Make a copy of the matrix (out)
  2. Store first value of the first row.
  3. Remove all the pairs that involve any of the pair.
  4. Select the next pair of the resulting matrix
  5. Repeat until all rows are removed of the matrix
  6. Repeat 2:5 starting from a different row

However, this method has one big problem, it doesn't guarantee that all the combinations are stored, and it could store several times the same combination.

My expected output is a vector, where each element is the product of the values in the cell selected by the combination:

AB, CD: 0.137553

How can I extract all those combinations efficiently?

llrs
  • 3,308
  • 35
  • 68
  • It sounds like you're trying to calculate permutations. This question might be helpful: https://stackoverflow.com/questions/11095992/generating-all-distinct-permutations-of-a-list-in-r – bouncyball Jul 23 '18 at 15:31
  • No, here the order it doesn't matter: 1) `AB, CD` is the same as `CD, AB`, or even than `BA, DC` or any mix in between and 2) AB is a cell of a matrix not an element of a list, and I don't want to repeat any cell of a row or column of an element that has been already used in the combination. – llrs Jul 23 '18 at 15:38
  • can we have duplicates like AE, BD and AE, DB ? – YOLO Jul 23 '18 at 15:39
  • @YOLO, in the output list? No if you don't know that all are duplicated or you know the order of the duplicates. That's why I create the out data.frame (It will make further calculations harder) – llrs Jul 23 '18 at 15:46

2 Answers2

1

This might work. I tested this on N elements = 5 and 6.

Note that this is not optimised, and hopefully can provide a framework for you to work from. With a much larger array, I can see steps involving apply and combn being a bottleneck.

The idea here is to generate a collection of unique sets first before calculating the product of the sets from another data.frame that stores values of sets.

Unique sets are identified by counting the number of unique elements in all combination pairs. For example, if N elements = 6, we expect length(unlist(combination)) == 6. The same is true if N elements = 7 (there will only be 3 pairs plus a remainder element). In cases where N elements is odd, we can ignore the remaining, unpaired element since it is constrained by the other elements.

library(dplyr)
library(reshape2)

## some functions

unique_by_n <- function(inlist, N){
  ## select unique combinations by count 
  ## if unique, expect n = 6 if n elements = 6)
  if(N %% 2) N <- N - 1 ## for odd numbers
  return(length(unique(unlist(inlist))) == N)
}

get_combs <- function(x,xall){
  ## format and catches remainder if matrix of odd elements
  xu <- unlist(x)
  remainder <- setdiff(xall,xu) ## catch remainder if any
  xset <- unlist(lapply(x, paste0, collapse=''))
  finalset <- c(xset, remainder)
  return(finalset)
}

## make dataset
set.seed(0) ## set reproducible example
#o <- matrix(runif(25), ncol = 5, nrow = 5) ## uncomment to test 5
#dimnames(o) <- list(LETTERS[1:5], LETTERS[1:5])
o <- matrix(runif(36), ncol = 6, nrow = 6)
dimnames(o) <- list(LETTERS[1:6], LETTERS[1:6])
o[lower.tri(o)] <- t(o)[lower.tri(o)] ## make matrix symmetric
n_elements = nrow(o)

#### get matrix
dat <- melt(o, varnames = c('Rw', 'Cl'), as.is = TRUE)
dat$Set <- apply(dat, 1, function(x) paste0(sort(unique(x[1:2])), collapse = ''))
## get unique sets (since your matrix is symmetric)
dat <- subset(dat, !duplicated(Set))

#### get sets
elements <- rownames(o)
allpairs <- expand.grid(Rw = elements, Cl = elements) %>% 
  filter(Rw != Cl) ## get all pairs
uniqpairsgrid <- unique(t(apply(allpairs,1,sort)))
uniqpairs <- split(uniqpairsgrid, seq(nrow(uniqpairsgrid))) ## get unique pairs
allpaircombs <- combn(uniqpairs,floor(n_elements/2)) ## get combinations of pairs
uniqcombs <- allpaircombs[,apply(allpaircombs, 2, unique_by_n, N = n_elements)] ## remove pairs with repeats
finalcombs <- apply(uniqcombs, 2, get_combs, xall=elements)

#### calculate results
res <- apply(finalcombs, 2, function(x) prod(subset(dat, Set %in% x)$value)) ## calculate product
names(res) <- apply(finalcombs, 2, paste0, collapse=',') ## add names
resdf <- data.frame(Sets = names(res), Products = res, stringsAsFactors = FALSE, row.names = NULL)
print(resdf)
#>        Sets    Products
#> 1  AB,CD,EF 0.130063454
#> 2  AB,CE,DF 0.171200062
#> 3  AB,CF,DE 0.007212619
#> 4  AC,BD,EF 0.012494787
#> 5  AC,BE,DF 0.023285088
#> 6  AC,BF,DE 0.001139712
#> 7  AD,BC,EF 0.126900247
#> 8  AD,BE,CF 0.158919605
#> 9  AD,BF,CE 0.184631344
#> 10 AE,BC,DF 0.042572488
#> 11 AE,BD,CF 0.028608495
#> 12 AE,BF,CD 0.047056905
#> 13 AF,BC,DE 0.003131029
#> 14 AF,BD,CE 0.049941770
#> 15 AF,BE,CD 0.070707311

Created on 2018-07-23 by the [reprex package](http://reprex.tidyverse.org) (v0.2.0.9000).
nachoes
  • 144
  • 4
  • Thanks for suggesting this approach. At the moment I can test it but it seems to work – llrs Jul 24 '18 at 07:03
0

Maybe the following does what you want.
Note that I was more interested in being right than in performance.

Also, I have set the RNG seed, to have reproducible results.

set.seed(9840)    # Make reproducible results

o <- matrix(runif(25), ncol = 5, nrow = 5)
dimnames(o) <- list(LETTERS[1:5], LETTERS[1:5])

cmb <- combn(LETTERS[1:5], 2)
n <- ncol(cmb)
res <- NULL
nms <- NULL
for(i in seq_len(n)){
  for(j in seq_len(n)[-seq_len(i)]){
    x <- unique(c(cmb[, i], cmb[, j]))
    if(length(x) == 4){
      res <- c(res, o[cmb[1, i], cmb[2, i]] * o[cmb[1, j], cmb[2, j]])
      nms <- c(nms, paste0(cmb[1, i], cmb[2, i], '*', cmb[1, j], cmb[2, j]))
    }
  }
}

names(res) <- nms

res
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • I think I follow the code except the second `for` loop, why do you omit from 1 to `i`? It works when there are only 2 pairs, but if the matrix of 6, then there are three pairs (all elements should appear on the combination whenever it is possible). Do you think it is possible to expand it to include as many pairs as needed (I'll edit the question) ? – llrs Jul 23 '18 at 15:55
  • 1
    @Llopis If I don't omit from `1` to `i` it will repeat the column and row names. – Rui Barradas Jul 23 '18 at 17:22