1

I have x, y, and z coordinate from a Principal Component Analysis that I would like to compute a euclidean distance matrix.

Test data:

                  X           Y             Z
samp_A -0.003467119 -0.01422762 -0.0101960126
samp_B -0.007279433  0.01651597  0.0045558849
samp_C -0.005392258  0.02149997  0.0177409387
samp_D -0.017898802  0.02790659  0.0006487222
samp_E -0.013564214  0.01835688  0.0008102952
samp_F -0.013375397  0.02210725 -0.0286032185

I would ultimately like to return a table in the following format:

    A    B     ...
A   0    0.2   ...
B   0.2  0     ...
... ...  ...   ...
... ...  ...   ...

Obviously the distance data above is fake. The X, Y and Z data is simply a head of the full dataset. The full dataset consists of about 4000 entires. I assume this would need to be done is an efficient manner. If it's easier, then computing the nearest distances of, say 10 points, could suffice (remaining points would be NA or 0).

Any help would be much appreciated!

EDIT: A suggestion arose to use dist but I do not believe this allow for three coordinates. IF i were to use dist the results seem to be nonsense(?).

> pca_coords_dist <- dist(pca_coords)
> head(pca_coords_dist)
[1] 0.03431210 0.04539427 0.04583855 0.03584466 0.04191922 0.04291657

I believe one way to go about this is to create a function to compute distance and apply it to each row in a pairwise manner. I think this is a correct function to compute distance in three dimensions.

euc.dist.3 <- function(x1, x2, y1, y2, z1, z2 ) sqrt( (x2 - x1)^2 + (y2 - y1)^2 + (z2 - z1)^2 )

If I apply this to sampA and sampB the results is 1.56643.

Now, is there a way to apply this function to every pairwise row? and format the output to a distance matrix?

user2117258
  • 515
  • 4
  • 18

2 Answers2

5

try ? dist in R:

distance.matrix <- dist(yourData, method = "euclidean", diag = T) 

In the code above, yourData is your data.frame or matrix

xtluo
  • 1,961
  • 18
  • 26
4

EDIT: dist(), as stated by Xiaotao Luo and Richard Telford works for 3D coordinates. In fact, this answer gives the same results as dist(). So use dist() !!

You could do something similar to this answer:

First create an index matrix with all pairwise row combinations:

Using:

x = matrix(runif(15),nrow = 5)

          [,1]       [,2]       [,3]
[1,] 0.1307924 0.94255848 0.55138616
[2,] 0.7027617 0.11180608 0.73997077
[3,] 0.5573857 0.64836253 0.11229408
[4,] 0.4391854 0.04849022 0.93454137
[5,] 0.5292623 0.19308569 0.00826927

ind = t(combn(nrow(x), 2))

> ind
      [,1] [,2]
 [1,]    1    2
 [2,]    1    3
 [3,]    1    4
 [4,]    1    5
 [5,]    2    3
 [6,]    2    4
 [7,]    2    5
 [8,]    3    4
 [9,]    3    5
[10,]    4    5

Then proceed to calculate the 3D distance for all these combinations using apply:

distances = apply(ind, 1, function(z){
    sqrt(sum((x[z[1],] - x[z[2], ])^2))
})

which gives:

> cbind(data.frame(ind), distances)
   X1 X2 distances
1   1  2 1.0260910
2   1  3 0.6792164
3   1  4 1.0204275
4   1  5 1.0077022
5   2  3 0.8384540
6   2  4 0.3336751
7   2  5 0.7563700
8   3  4 1.0246505
9   3  5 0.4678558
10  4  5 0.9418077

In brief:

ind = t(combn(nrow(x), 2))
distances = apply(ind, 1, function(z){
    sqrt(sum((x[z[1],] - x[z[2], ])^2))
})
result = cbind(data.frame(ind), distances)

where x is your matrix with 3D coordinates

Community
  • 1
  • 1
R. Schifini
  • 9,085
  • 2
  • 26
  • 32