1

I am computing distance matrix between large number of locations (5000) on sphere (using Haversine distance function).

Here is my code:

require(geosphere)
x=rnorm(5000)
y=rnorm(5000)
xy1=cbind(x,y)

The time taken for computing the distance matrix is

 system.time( outer(1:nrow(xy1), 1:nrow(xy1), function(i,j) distHaversine(xy1[i,1:2],xy1[j,1:2])))

The time taken to execute this program is high. Any suggestion how to lower time consumption to do this job! Thanks.

www
  • 38,575
  • 12
  • 48
  • 84
Janak
  • 653
  • 7
  • 25
  • 1
    you can try an alternative implementation. see http://www.r-bloggers.com/computational-efficiency-of-great-circle-distance-calculations-in-r/ – Leo May 24 '16 at 07:45
  • 2
    @Leo in good conscience and without meaning to offend I have to point out that the linked article is *terrible*! The author uses a `for` loop to cycle through a vector to repeatedly call a function (`distHaversine()`) which is already *vectorised*!! They wrote *more* code whilst also slowing the speed of execution by about 300X!!! Do not heed this article! You don't call a function 10,000 times when once will do! – Simon O'Hanlon May 24 '16 at 10:07
  • Hi @SimonO'Hanlon , thanks for the heads up. :-) – Leo May 24 '16 at 12:17

2 Answers2

3

Try the built-in function in the geosphere package?

z <- distm( xy1 )

The default distance function for distm() - which calculates a distance matrix between a set of points - is the Haversine ("distHaversine") formula, but you may specify another using the fun argument.

On my 2.6GHz Core i7 rMBP this takes about 5 seconds for 5,000 points.

Simon O'Hanlon
  • 58,647
  • 14
  • 142
  • 184
1

I add below a solution using the spatialrisk package. The key functions in this package are written in C++ (Rcpp), and are therefore very fast.

library(geosphere)
library(spatialrisk)
library(data.table)

x=rnorm(5000)
y=rnorm(5000)
xy1 = data.table(x,y)

# Cross join two data tables
coordinates_dt <- optiRum::CJ.dt(xy1, xy1)

system.time({
  z <- distm( xy1 )
})
# user  system elapsed 
# 14.163   3.700  19.072 

system.time({
  distances_m <- coordinates_dt[, dist_m := spatialrisk::haversine(y, x, i.y, i.x)]
})
# user  system elapsed 
# 2.027   0.848   2.913 
mharinga
  • 1,708
  • 10
  • 23