1

I have a dataframe of locations with their longitude and latitude, it looks something like:

set.seed(211)
latitude <-runif(5000, min=50, max=55)
longitude <- runif(5000, min=-2, max=0)
location_id <- seq(1,5000)

reprex <- data.frame(location_id, latitude, longitude)

For each location_id, I need to count the number of other locations within the list that are within 10 miles (~16000 metres) of that location.

I was thinking of using geosphere::distGeo() for this within some sort of for loop (or perhaps an apply function), but I just cannot work out how to code it so that it compares every other item in the list with the current item and counts how many are within a certain threshold, records that value and then moves on to the next row.

Does anyone know how to write this?

fabla
  • 1,806
  • 1
  • 8
  • 20
Mel
  • 700
  • 6
  • 31
  • Is computing speed an important issue here? The simplest code options for your problem are probably not going to be the fastest ones. If you need speed, it's certainly worth developping an algorithm that groups your points into categories (think grid pattern on your map). If your grid step is 10 miles, then you only have to look at the elements in the same group, or in the neighbors group, instead of browsing the whole map for each point. Depending on the number of points you have, and their density, the impact of this optimisation could be more or less important. – Andéol Evain Mar 06 '20 at 15:26

4 Answers4

0

The distGeo function can do it but you need a loop. Note that the first column of the coordinates must be longitude.

lst <- vector(50, mode="list")

for(i in 1:50) {
    dist <- distGeo(p1=reprex[i,c(3,2)], p2=reprex[-i,c(3,2)])
    lst[[i]] <- sum(dist<16000)
}

reprex$n <- unlist(lst)

table(unlist(lst))
 0  1  2 
34 10  6

So 34 of the 50 points don't have any other point within 10 miles (~16,000 metres), 10 have only 1 and 6 have 2.

Edward
  • 10,360
  • 2
  • 11
  • 26
0

The rdist.earth function in fields seems useful for this, for example as:

library(fields)
dist.matrix <- rdist.earth(reprex[-1])
colSums(dist.matrix<10)

The only trick in this case is using colSums on a logical matrix to count the number of TRUE values.

Note that miles are the default, km can be used with the argument miles=FALSE.

Miff
  • 7,486
  • 20
  • 20
0

Hiding the loop in a (still slow) apply and untangling latitude and longitude (they are usually the other way round), you could try something like

set.seed(211)
latitude <-runif(5000, min=50, max=55)
longitude <- runif(5000, min=-2, max=0)
location_id <- seq(1, 5000)
reprex <- data.frame(location_id, latitude, longitude)

library(geosphere)
within10m <- function(p1, p2, dist=16000){
  sum(geosphere::distGeo(p1, p2) <= dist)
  }
matpoints <- as.matrix(reprex[, 3:2])
reprex$neighbours <- 
  apply(matpoints, 1, within10m, p2=matpoints) - 1
head(reprex) 
#   location_id latitude  longitude neighbours
# 1           1 51.17399 -1.1489713         48
# 2           2 54.52623 -1.8554624         39
# 3           3 54.84852 -0.3014742         56
# 4           4 51.72104 -1.8644226         50
# 5           5 51.32793 -0.7417923         56
# 6           6 50.07346 -0.8939857         36
Henry
  • 6,704
  • 2
  • 23
  • 39
0

Ultimately I used the answer here as it was quite elegant and avoided loops: Calculating number of points within a certain radius

I used the code:

library(geosphere) # for distHaversine() and distm() functions

reprex <- cbind(reprex, # appends to the dataset... 
                     count_nearby=rowSums( # ... a column which counts the rows in the dataset...
                       distm (reprex[,3:2], fun = distHaversine) # ... where the distance between current and other rows...
                       <= 16000)-1 # ... is less than 16000 metres. Take one away because it counts itself!
                ) #close the cbind brackets!
Mel
  • 700
  • 6
  • 31