0

I have a set of points coordinates and I want to use it to generate a matrix of distances. More specifically, I have two sets of points, A of size n and B of size m, given as 2d coordinates and I want to have all Euclidean distances between points from A and points from B and no other distances, in a matrix.

Edit: what if the situation is more complicated: what if I have my matrix but now I want to divide each row of it by the sum of Euclidean distances of the first point from A from all the points in set B: that is, normalise each row of distances. Is there an efficient way to do that?

user132290
  • 179
  • 1
  • 2
  • 8
  • 1
    It doesn't look like you learned much about posting questions on SO [since your previous question](http://stackoverflow.com/questions/27676815/very-specific-vectorisation-in-r)... Though you could take a look at `?dist` – David Arenburg Dec 28 '14 at 15:09
  • What exactly is wrong with my question? Can you specify? I'll know for future reference. – user132290 Dec 28 '14 at 15:38
  • Read [**this**](http://stackoverflow.com/help/how-to-ask) and [**this**](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – David Arenburg Dec 28 '14 at 15:39

2 Answers2

3
set.seed(101)
n <- 10; m <- 20
A <- data.frame(x=runif(n),y=runif(n))
B <- data.frame(x=runif(m),y=runif(m))

We want

sqrt((x_{1,i}-x_{2,j})^2+(y_{1,i}-y_{2,j})^2)

for every i=1:n and j=1:m.

You can do this via

dists <- sqrt(outer(A$x,B$x,"-")^2 + outer(A$y,B$y,"-")^2)

which in this case is a 10x20 matrix. In words, we're finding the difference ("-" is a reference to the subtraction operator) between each pair of x values and each pair of y values, squaring, adding, and taking the square root.

If you want to normalize every row by its sum, I would suggest

norm.dists <- sweep(dists,MARGIN=1,STATS=rowSums(dists),FUN="/")
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • Could you explain what exactly the "-" bit means and what else one could put there? – user132290 Dec 28 '14 at 15:24
  • Just out of curiosity: what would I do, if I wanted a function different from distance, just an arbitrary function of the coordinates? – user132290 Dec 28 '14 at 15:37
  • if you can decompose it into a pairwise operation on scalar, then you can use `outer()` with a different function. If you want to compute a function `f(x_{1i},x_{2j},y{1i},y_{2j})` in general, that could be hard to vectorize efficiently. Can you give an example? (PS: it seems that @jlhoward's suggested `proxy::dist()` takes an arbitrary function.) – Ben Bolker Dec 28 '14 at 18:07
2

The dist(...) function in base R will not be helpful, because it calculates the auto-distances (distance from every point to every other point in a given dataset). You want cross-distances. There is a dist(...) function in package proxy which is designed for this.

Using the dataset kindly provided by @BenBolker,

library(proxy)   # note that this masks the dist(...) fn in base R...
result <- dist(A,B)
result[1:5,1:5]
#           [,1]      [,2]      [,3]      [,4]      [,5]
# [1,] 0.5529902 0.7303561 0.1985409 0.6184414 0.7344280
# [2,] 0.7109408 0.9506428 0.1778637 0.7216595 0.9333687
# [3,] 0.2971463 0.3809688 0.4971621 0.4019629 0.3995298
# [4,] 0.4985324 0.5737397 0.4760870 0.5986826 0.5993541
# [5,] 0.4513063 0.7071025 0.3077415 0.4289675 0.6761988
jlhoward
  • 58,004
  • 7
  • 97
  • 140