1

This is a bit of a challenging one so I did my best to be reproducible/follow guidelines/etc.

This is related to my earlier question here, but now I want to add one more dimension. The solution needs to be VERY fast, so no looping, apply, or slow merges if possible.

Consider the below:

set.seed(1)

    M = matrix(rpois(50,5),5,5)



        v1 = c(4  ,  8  ,  3 ,   5 ,   9)       
        v2 = c(5  ,  6  ,  6 ,  11  ,  6)
        v3 = c( 5  ,  6 ,   6 ,  11  ,  6)
        v4=  c(8, 6,  4, 4, 3)
        v5 =  c(4  ,  8  ,  3 ,   5  ,  9)
        v6=  c(8  ,  6  ,  4  ,  4 ,   3)
        v7 = c( 3 ,   2  ,  7   , 7 ,   4)
        v8=  c(3  ,  2   , 7   , 7  ,  4)

row1 = c(v1,v2)
row2 = c(v3,v4)

row3 = c(v5,v6)

row4 = c(v7,v8)

Vmat = rbind(row1,row2,row3,row4)


     M
     [,1] [,2] [,3] [,4] [,5]
[1,]    4    8    3    5    9
[2,]    4    9    3    6    3
[3,]    5    6    6   11    6
[4,]    8    6    4    4    3
[5,]    3    2    7    7    4




 Vmat
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
row1    4    8    3    5    9    5    6    6   11     6
row2    5    6    6   11    6    8    6    4    4     3
row3    4    8    3    5    9    8    6    4    4     3
row4    3    2    7    7    4    3    2    7    7     4

Each row of Vmat is composed of two rows of M stacked side by side. Hence...

Consider mentally Vmat into 2 matrices (in my problem, it is many more than 2, up to 500,000 across) between columns 5 and 6.

For each submatrix of Vmat, I want to say where each row vector corresponds to the row in M.

The output should thus be be...



      [,1] [,2]
[1,]    1    3
[2,]    3    4
[3,]    1    4
[4,]    5    5

I'm thinking maybe stacking the Vmat matrix like in this question could be a first pass, then doing the row lookups, then reshaping.

Community
  • 1
  • 1
wolfsatthedoor
  • 7,163
  • 18
  • 46
  • 90

1 Answers1

0

One possible solution ...

getMatchingVec <- function(Vmat, M){

  # help function to generate the sub matrixs  
  getVmatSubset <- function(Vmat, M){
    lapply(1:(ncol(Vmat)/ncol(M)), 
           FUN = function(x, y = ncol(M), mat = Vmat){
             i <- (x - 1) * y + 1
             j <- (x - 1) * y + y
             subset(mat, select = i:j)
           }) 
  }

  # lapply over sub`s, apply each ros  
  resultList <- 
    lapply(getVmatSubset(Vmat, M), 
            FUN = function(sVmat, matM = M)
                  apply(sVmat, 1, FUN = function(x, mat = matM) 
                  which(colSums(t(M) == x) == ncol(M))))
   do.call(cbind, resultList)
}


# function call
getMatchingVec(Vmat, M)
holzben
  • 1,459
  • 16
  • 24
  • I think the problem with lapply is that it will be too slow, not much better than syntactic sugar for a for loop – wolfsatthedoor Sep 27 '15 at 16:40
  • 1) if you call ``lapply`` syntactic sugar then better invest time to understand the language features of R 2) if the solution is to slow substitute ``lapply`` with ``mclapply`` from the ``parallel`` package and do it in parallel – holzben Sep 27 '15 at 17:07
  • Please see this question as to whether lapply is syntactic sugar: http://stackoverflow.com/questions/2275896/is-rs-apply-family-more-than-syntactic-sugar – wolfsatthedoor Sep 27 '15 at 21:55
  • Obviously I cannot afford to pay the fixed cost of parallelization in this example and need a vectorized solution – wolfsatthedoor Sep 27 '15 at 21:55