1

I have a set of n m-tuples with elements from A = {0,1,2,3}, so a(i,j) ∈ {0,1,2,3} for i ∈ M = {1,...,m} and j ∈ N = {1,...,n}. I would like to determine the smallest subset M' of M so that the corresponding subtuples a(i',j) with i' ∈ M' and j ∈ N are all pairwise different.

Here is an example in R:

n <- 10
m <- 20

set.seed(13)

set_of_tuples <- replicate(n, sample((0:3), m, replace = TRUE))

set_of_tuples

      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
 [1,]    2    0    2    1    2    2    0    0    1     2
 [2,]    0    2    0    2    1    2    0    0    3     0
 [3,]    1    2    3    3    3    3    2    1    2     1
 [4,]    0    2    2    2    1    2    2    2    3     3
 [5,]    3    0    2    1    2    1    1    1    0     0
 [6,]    0    2    3    1    2    0    0    3    3     3
 [7,]    2    0    0    1    3    1    0    0    0     0
 [8,]    3    1    1    0    1    0    0    2    3     2
 [9,]    3    1    2    1    1    2    3    2    3     0
[10,]    0    2    2    1    3    2    3    2    3     3
[11,]    2    1    2    1    0    1    0    0    1     1
[12,]    3    1    1    1    3    0    3    1    2     1
[13,]    3    3    0    1    0    0    1    3    2     0
[14,]    2    3    3    0    3    1    0    2    1     0
[15,]    2    2    0    1    0    3    3    1    0     2
[16,]    1    0    1    3    1    0    3    1    0     3
[17,]    1    1    0    0    2    3    3    2    3     2
[18,]    2    0    1    1    3    1    2    2    1     2
[19,]    3    3    0    0    3    3    1    2    3     3
[20,]    2    0    0    2    3    3    0    3    1     1

Let s be the length of M'. Obviously, s ≥ n**(1/4) since there are only 4**s different subtuples of length s. In the examples above s ≥ 2.

s <- ceiling(n**(1/4))

To test if a suitable subset M' of length 2 exists you can evaluate all possible subtuples:

pairwise_compare_subtuples <- function(i){
  combn(ncol(set_of_tuples), 2, FUN = function(j) identical(set_of_tuples[i,][, j[1]], set_of_tuples[i,][, j[2]]))
}


index <- combn(nrow(set_of_tuples), s, FUN = NULL, simplify = FALSE)

tmp <- lapply(index, pairwise_compare_subtuples)

table(!unlist(lapply(tmp, any)))

FALSE  TRUE 
  187     3

So, in the example here, there are three 2-element subsets of M for which all n 2-element subtuples are pairwise different. From the index you can tell the first one is the pair (9, 15) and, indeed, in

set_of_tuples[c(9,15),]
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    3    1    2    1    1    2    3    2    3     0
[2,]    2    2    0    1    0    3    3    1    0     2

the 10 pairs are all pairwise different.

While in principle this works fine, it gets very slow for large n and m. So I would like to ask two questions:

1) Is there a better approach to this than "trial and error", i.e. evaluating all possible subtuples?

2) If not, can the computation be made faster in R?

Psidom
  • 209,562
  • 33
  • 339
  • 356
mce
  • 107
  • 1
  • 6
  • 1
    Here is a faster version of [combn](http://stackoverflow.com/questions/26828301/faster-version-of-combn) (if that makes any difference) – akrun Aug 20 '16 at 16:30
  • This problem is unfortunately NP-hard (you can reduce the NP-hard Set Cover problem to it, though I don't have the time to do that right now unfortunately). So in general, expect an exponential worst-case execution time. – j_random_hacker Aug 20 '16 at 18:27
  • @ akrun: I would like to give this a try. Unfortunately, combnPrim doesn't take a function as an argument in the same way as combn does. So I modified my code like this: `index_i <- combnPrim(m, s, simplify = FALSE) index_j <- combnPrim(n, 2, simplify = FALSE) create_subtuples <- function(i){ set_of_tuples[i,] } list_of_subtuples <- lapply(index_i, create_subtuples) pairwise_compare_subtuples <- function(x,j){ lapply(j, identical(x[,j[[1]]], x[,j[[2]]])) } tmp <- lapply(list_of_subtuples, pairwise_compare_subtuples, j=index_j)` – mce Aug 21 '16 at 08:13
  • @ akrun: Sorry, unable to finish the above. In any case I get an error 'Error in match.fun(FUN) : 'identical(x[, j[[1]]], x[, j[[2]]])' is not a function, character or symbol' which I don't understand. Maybe you can help. Thanks, mce – mce Aug 21 '16 at 08:23

0 Answers0