-1

Sorry if this has been posted before. I looked for the answer both on Google and Stackoverflow and couldn't find a solution.

Right now I have two matrices of data in R. I am trying to loop through each row in the matrix, and find the row in the other matrix that is most similar by some distance metric (for now least squared). I figured out one method but it is O(n^2) which is prohibitive for my data.

I think this might be similar to some dictionary learning techniques but I couldn't find anything.

Thanks!

Both matrices are just 30 by n matrices with a number at each entry.

distance.fun=function(mat1,mat2){   
  match=c()  
  for (i in 1:nrow(mat1)){  

    if (all(is.na(mat1[i,]))==FALSE){  
    dist=c()  

    for (j in 1:nrow(mat2)){  
      dist[j]=sum((mat1[i,]-mat2[j,])^2)  
      match[i]=which(min(dist) %in% dist)  
    }  
    }  
  }  
  return(match)  
}
  • Could you post what you have tried? – bencripps Sep 10 '14 at 01:16
  • This question could benefit from a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) showing sample input data that is representative of your actual data as well as the desired output. But if you want to minimize distance between two matrixes with `n` rows the number of distances you will have to create is on the order of `O(n^2)`. That doesn't seem avoidable unless you want to change your matching criteria. if you need help with statistical methods, try [stats.se] instead. – MrFlick Sep 10 '14 at 01:33
  • @bencripps Here is a really simple version of what I have right now. It is imperfect because I wanted to check if there might be a less expensive way before fine tuning it. – Bryan David Sep 10 '14 at 02:26

1 Answers1

0

A better strategy would be to compute the distance matrix all at once first, then extract the mins. Here's an example using simualted data

set.seed(15)
mat1<-matrix(runif(2*25), ncol=2)
mat2<-matrix(runif(2*25), ncol=2)

and here's a helper function that can calculate the distances between values in one matrix to another. It uses the built in dist function but it does do unnecessary within-group comparisons that we eventually have to filter out, still it may be better performing overall.

distab<-function(m1, m2) {
    stopifnot(ncol(m1)==ncol(m2))
    m<-as.matrix(dist(rbind(m1, m2)))[1:nrow(m1), -(1:nrow(m1))]
    rownames(m)<-rownames(m1)
    colnames(m)<-rownames(m2)
    m
}

mydist<-distab(mat1, mat2)

now that we have the between-group distances, we just need to minimize the matches.

best <- apply(mydist, 2, which.min)
rr <- cbind(m1.row=seq.int(nrow(mat1)), best.m2.row = best)
head(rr)   #just print a few

#      m1.row best.m2.row
# [1,]      1           1
# [2,]      2          14
# [3,]      3           7
# [4,]      4           3
# [5,]      5          23
# [6,]      6          15

note that with a strategy like this (we well as with your original implementation) it is possible for multiple rows from mat1 to match to the same row in mat2 and some rows in mat2 to be unmatched to mat1.

MrFlick
  • 195,160
  • 17
  • 277
  • 295