I have a set of n m-tuples with elements from A = {0,1,2,3}
, so a(i,j) ∈ {0,1,2,3} for i ∈ M = {1,...,m} and j ∈ N = {1,...,n}
. I would like to determine the smallest subset M' of M so that the corresponding subtuples a(i',j) with i' ∈ M' and j ∈ N
are all pairwise different.
Here is an example in R:
n <- 10
m <- 20
set.seed(13)
set_of_tuples <- replicate(n, sample((0:3), m, replace = TRUE))
set_of_tuples
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 2 0 2 1 2 2 0 0 1 2
[2,] 0 2 0 2 1 2 0 0 3 0
[3,] 1 2 3 3 3 3 2 1 2 1
[4,] 0 2 2 2 1 2 2 2 3 3
[5,] 3 0 2 1 2 1 1 1 0 0
[6,] 0 2 3 1 2 0 0 3 3 3
[7,] 2 0 0 1 3 1 0 0 0 0
[8,] 3 1 1 0 1 0 0 2 3 2
[9,] 3 1 2 1 1 2 3 2 3 0
[10,] 0 2 2 1 3 2 3 2 3 3
[11,] 2 1 2 1 0 1 0 0 1 1
[12,] 3 1 1 1 3 0 3 1 2 1
[13,] 3 3 0 1 0 0 1 3 2 0
[14,] 2 3 3 0 3 1 0 2 1 0
[15,] 2 2 0 1 0 3 3 1 0 2
[16,] 1 0 1 3 1 0 3 1 0 3
[17,] 1 1 0 0 2 3 3 2 3 2
[18,] 2 0 1 1 3 1 2 2 1 2
[19,] 3 3 0 0 3 3 1 2 3 3
[20,] 2 0 0 2 3 3 0 3 1 1
Let s be the length of M'. Obviously, s ≥ n**(1/4)
since there are only 4**s different subtuples of length s. In the examples above s ≥ 2.
s <- ceiling(n**(1/4))
To test if a suitable subset M' of length 2 exists you can evaluate all possible subtuples:
pairwise_compare_subtuples <- function(i){
combn(ncol(set_of_tuples), 2, FUN = function(j) identical(set_of_tuples[i,][, j[1]], set_of_tuples[i,][, j[2]]))
}
index <- combn(nrow(set_of_tuples), s, FUN = NULL, simplify = FALSE)
tmp <- lapply(index, pairwise_compare_subtuples)
table(!unlist(lapply(tmp, any)))
FALSE TRUE
187 3
So, in the example here, there are three 2-element subsets of M for which all n 2-element subtuples are pairwise different. From the index you can tell the first one is the pair (9, 15) and, indeed, in
set_of_tuples[c(9,15),]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 3 1 2 1 1 2 3 2 3 0
[2,] 2 2 0 1 0 3 3 1 0 2
the 10 pairs are all pairwise different.
While in principle this works fine, it gets very slow for large n and m. So I would like to ask two questions:
1) Is there a better approach to this than "trial and error", i.e. evaluating all possible subtuples?
2) If not, can the computation be made faster in R?