Compute euclidean distance matrix from x,y,z coordinates

Question

I have x, y, and z coordinate from a Principal Component Analysis that I would like to compute a euclidean distance matrix.

Test data:

                  X           Y             Z
samp_A -0.003467119 -0.01422762 -0.0101960126
samp_B -0.007279433  0.01651597  0.0045558849
samp_C -0.005392258  0.02149997  0.0177409387
samp_D -0.017898802  0.02790659  0.0006487222
samp_E -0.013564214  0.01835688  0.0008102952
samp_F -0.013375397  0.02210725 -0.0286032185

I would ultimately like to return a table in the following format:

    A    B     ...
A   0    0.2   ...
B   0.2  0     ...
... ...  ...   ...
... ...  ...   ...

Obviously the distance data above is fake. The X, Y and Z data is simply a head of the full dataset. The full dataset consists of about 4000 entires. I assume this would need to be done is an efficient manner. If it's easier, then computing the nearest distances of, say 10 points, could suffice (remaining points would be NA or 0).

Any help would be much appreciated!

EDIT: A suggestion arose to use dist but I do not believe this allow for three coordinates. IF i were to use dist the results seem to be nonsense(?).

> pca_coords_dist <- dist(pca_coords)
> head(pca_coords_dist)
[1] 0.03431210 0.04539427 0.04583855 0.03584466 0.04191922 0.04291657

I believe one way to go about this is to create a function to compute distance and apply it to each row in a pairwise manner. I think this is a correct function to compute distance in three dimensions.

euc.dist.3 <- function(x1, x2, y1, y2, z1, z2 ) sqrt( (x2 - x1)^2 + (y2 - y1)^2 + (z2 - z1)^2 )

If I apply this to sampA and sampB the results is 1.56643.

Now, is there a way to apply this function to every pairwise row? and format the output to a distance matrix?

It is my understanding that `dist` does not work with three coordinates. I would need to apply a function to each row in a pairwise manner. — user2117258, Sep 24 '16 at 01:37
If you do `euc.dist.3` for A and B it gives 0.0343121. Do `euc.dist.3(-0.003467119, -0.007279433, -0.01422762, 0.01651597, -0.0101960126, 0.0045558849)` — R. Schifini, Sep 24 '16 at 02:28

score 5 · Accepted Answer · answered Sep 24 '16 at 01:48

5

try ? dist in R:

distance.matrix <- dist(yourData, method = "euclidean", diag = T)

In the code above, yourData is your data.frame or matrix

answered Sep 24 '16 at 01:48

xtluo

1,961
18
26

It is a `data.frame` – user2117258 Sep 24 '16 at 01:54
I do not believe `dist` support three coordinates. Please see edit above. – user2117258 Sep 24 '16 at 01:56
**dist** return a distance matrix, let's say **dis.mat**, and **dis.mat[i, j]** represents the value of one kind of distance between your **data.frame** _ith_ row and _jth_ row – xtluo Sep 24 '16 at 02:02
and certainly **dist** supports three coordinates, actually it supports dimension of your data.frame, like 3, 10, 20... – xtluo Sep 24 '16 at 02:04

score 4 · Answer 2 · edited May 23 '17 at 12:33

EDIT: dist(), as stated by Xiaotao Luo and Richard Telford works for 3D coordinates. In fact, this answer gives the same results as dist(). So use dist() !!

You could do something similar to this answer:

First create an index matrix with all pairwise row combinations:

Using:

x = matrix(runif(15),nrow = 5)

          [,1]       [,2]       [,3]
[1,] 0.1307924 0.94255848 0.55138616
[2,] 0.7027617 0.11180608 0.73997077
[3,] 0.5573857 0.64836253 0.11229408
[4,] 0.4391854 0.04849022 0.93454137
[5,] 0.5292623 0.19308569 0.00826927

ind = t(combn(nrow(x), 2))

> ind
      [,1] [,2]
 [1,]    1    2
 [2,]    1    3
 [3,]    1    4
 [4,]    1    5
 [5,]    2    3
 [6,]    2    4
 [7,]    2    5
 [8,]    3    4
 [9,]    3    5
[10,]    4    5

Then proceed to calculate the 3D distance for all these combinations using apply:

distances = apply(ind, 1, function(z){
    sqrt(sum((x[z[1],] - x[z[2], ])^2))
})

which gives:

> cbind(data.frame(ind), distances)
   X1 X2 distances
1   1  2 1.0260910
2   1  3 0.6792164
3   1  4 1.0204275
4   1  5 1.0077022
5   2  3 0.8384540
6   2  4 0.3336751
7   2  5 0.7563700
8   3  4 1.0246505
9   3  5 0.4678558
10  4  5 0.9418077

In brief:

ind = t(combn(nrow(x), 2))
distances = apply(ind, 1, function(z){
    sqrt(sum((x[z[1],] - x[z[2], ])^2))
})
result = cbind(data.frame(ind), distances)

where x is your matrix with 3D coordinates

Compute euclidean distance matrix from x,y,z coordinates

2 Answers2