R distance matrix with non-zero values on diagonal (rdist.earth )

Question

The entries on the diagonal of a spatial distance matrix should be zero, bc they represent the distance between each location and itself. But the rdist.earth() function from the fields R package sometimes give me non-zeros on the diagonal:

> # Set number of decimals of output display
> options(digits=8)
> # Some longitude, latitude data
> LLdat
     lon      lat
 1 -105.85878 43.65797
 2 -105.81812 43.57009
 3 -105.80796 43.57748
 > 
 > # Create distance matrix
 > library(fields)
 > distmat <- rdist.earth(LLdat,LLdat)
 > distmat
      1              2          3
1 0.0000000 6.410948951394 6.12184338
2 6.4109490 0.000059058368 0.72150586
3 6.1218434 0.721505863563 0.00000000

In the above distance matrix, the second entry on the diagonal is 0.000059058368, in miles (the default unit), whereas the other two are 0.0000000. First, why do the entries of the second column show more digits than the other two? And why is the entry on the second diagonal not zero to 8 decimals like the others? The discrepancy doesn't seem small enough to attribute to floating point rounding error.

Now compare the output of rdist.earth() to that of a different package, geosphere, and function distGeo(), which computes the distance between two points (not a full distance matrix). Here, we compute the distance between each point and itself. The output vector units are in meters:

> library(geosphere)
> distmat2 <- distGeo(LLdat,LLdat)
> distmat2
[1] 0 0 0

So with distGeo(), all three distances measures are in agreement and appropriately zero.

Is there something I'm missing? Or does this indicate a problem with rdist.earth()?

score 3 · Accepted Answer · answered Mar 12 '18 at 23:56

Unfortunately, it is a rounding error.

If you look at the source code, you can replicate the issue:

x1 <- LLdat

R <- 3963.34
coslat1 <- cos((x1[, 2] * pi)/180)
sinlat1 <- sin((x1[, 2] * pi)/180)
coslon1 <- cos((x1[, 1] * pi)/180)
sinlon1 <- sin((x1[, 1] * pi)/180)

pp <- cbind(coslat1 * coslon1, coslat1 * sinlon1, sinlat1) %*% 
    t(cbind(coslat1 * coslon1, coslat1 * sinlon1, sinlat1))
return_val = (R * acos(ifelse(abs(pp) > 1, 1 * sign(pp), pp)))

The function first calcs an intermediate matrix pp:

print (pp)

             [,1]         [,2]         [,3]
[1,] 1.0000000000 0.9999986917 0.9999988071
[2,] 0.9999986917 1.0000000000 0.9999999834
[3,] 0.9999988071 0.9999999834 1.0000000000

It seems like the diagonal is all the same. However:

print(pp, digits=22)
                         [,1]                     [,2]                     [,3]
[1,] 1.0000000000000000000000 0.9999986917465573110775 0.9999988070789928018556
[2,] 0.9999986917465573110775 0.9999999999999998889777 0.9999999834298258782894
[3,] 0.9999988070789928018556 0.9999999834298258782894 1.0000000000000000000000


> acos(0.9999999999999998889777) * R
[1] 5.905836821e-05
> acos(1.0000000000000000000000) * R
[1] 0

Julius Vainora · Answer 2 · 2018-03-13T00:24:44.047

As @thc explains, it indeed is a numeric problem, apparently related to the formula choice. In particular, notice that before using acos all the values are very close to 1. The derivative of acos at x is -(1-x^2)^(-1/2), diverging to -Inf as x goes to 1, so it's not surprising that the formula is sensitive.

As to deal with that, you may implement one of the other proposed and more stable solutions in the wikipedia page, use geosphere as they seem to have a much more cautious implementation, or of course you may just set diag(M) <- 0. However, the latter option perhaps is not advisable because those numeric issues may remain also in the off-diagonal terms when the points are really very close in space.

R distance matrix with non-zero values on diagonal (rdist.earth )

2 Answers2