R Spatstat: Identify nearest neighbours for further use

Question

I have a dataframe with 488 GPS points (long and lat). For each 488 points I would like to find their 2 closest neighbours.

So far I have created a point pattern object and computed the distance from the nearest two points (below). However, I would like to go a step further and be able to identify these nearest points by their ID from the original dataset.

Currently, my script works like:

# 1. store x and y coords in two vectors
lon <- data$longitude
lat <- data$latitude

# 2. create two vectors xrange and yrange with dimensions of triangle that contain all points
xrange <- range(lon, na.rm=T)
yrange <- range(lat, na.rm=T)

# 3. create ppp
lf <- ppp(lon, lat, xrange, yrange)

plot(lf)

nndist(lf, k = 1:2)

Giving me (example of top 5 results):

             dist.1       dist.2
  [1,] 1.426925e-03 0.0017007414
  [2,] 1.017287e-03 0.0015574895
  [3,] 6.502012e-04 0.0010172867
  [4,] 6.502012e-04 0.0007202307
  [5,] 7.202307e-04 0.0010472445

But I would like to be able to link this back to the "hhid" from the original dataset to something like this:

  hhid         dist.1  dist.1.hhid         dist.2     dist.1.hhid
  1    1.426925e-03             7  0.0017007414                 3
  2    1.017287e-03             6  0.0015574895                 4
  3    6.502012e-04            10  0.0010172867                 5
  4    6.502012e-04             2  0.0007202307                 8
  5    7.202307e-04             1  0.0010472445                13

First 20 rows of original dataset :

structure(list(hhid = c(2004L, 2006L, 2009L, 2012L, 2013L, 2020L, 
2022L, 2023L, 2028L, 2029L, 2035L, 2036L, 2043L, 2046L, 2047L, 
2059L, 2062L, 2063L, 2065L, 2066L), longitude = c(-1.478302479, 
-1.477469802, -1.476488709, -1.476146936, -1.47547996, -1.475799441, 
-1.475903392, -1.476232767, -1.476053953, -1.477196693, -1.476906657, 
-1.478778243, -1.480723381, -1.433436394, -1.433033824, -1.428791046, 
-1.431989908, -1.432058454, -1.43134892, -1.430848002), latitude = c(12.10552216, 
12.10700512, 12.10673618, 12.10618305, 12.10645485, 12.10846806, 
12.1080761, 12.10830975, 12.11114883, 12.11076546, 12.11197853, 
12.11345387, 12.10725021, 12.1183548, 12.11699867, 12.11466122, 
12.1154108, 12.11545277, 12.11554337, 12.11567497)), row.names = c(NA, 
20L), class = "data.frame")

score 2 · Answer 1 · answered Jun 29 '23 at 02:49

Since you used the spatstat command nndist to compute the nearest-neighbour distances, you could just use the corresponding command nnwhich to obtain the identifiers (serial numbers) of the nearest neighbours.

However, latitude and longitude coordinates cannot be treated as x and y coordinates. Even in a small region of the globe, one unit of latitude is not the same distance as one unit of longitude.

Future versions of spatstat will handle latitude/longitude coordinates. For now, you are probably better advised to use the sp and rgeos packages as described in the separate answer by jpsmith.

score 2 · Answer 2 · answered Jun 30 '23 at 14:28

If you plan to use spatstat for further analysis you need to project to planar coordinates first and then you can use nnwhich() as described by @adrian-baddely. Note that there is no universaly best coordinate reference system (CRS) to use, so it requires a choice from you. There is an R package crsuggest that can help you find relevant ones. Here I use UTM zone 30N which covers all of Burkina Faso as far as I can tell (srid=32630):

library(sf)
library(spatstat)

hhid <- c(2004L, 2006L, 2009L, 2012L, 2013L, 2020L, 
          2022L, 2023L, 2028L, 2029L, 2035L, 2036L, 2043L, 2046L, 2047L, 
          2059L, 2062L, 2063L, 2065L, 2066L)
longitude <- c(-1.478302479, 
              -1.477469802, -1.476488709, -1.476146936, -1.47547996, -1.475799441, 
              -1.475903392, -1.476232767, -1.476053953, -1.477196693, -1.476906657, 
              -1.478778243, -1.480723381, -1.433436394, -1.433033824, -1.428791046, 
              -1.431989908, -1.432058454, -1.43134892, -1.430848002) 
latitude <- c(12.10552216, 
              12.10700512, 12.10673618, 12.10618305, 12.10645485, 12.10846806, 
              12.1080761, 12.10830975, 12.11114883, 12.11076546, 12.11197853, 
              12.11345387, 12.10725021, 12.1183548, 12.11699867, 12.11466122, 
              12.1154108, 12.11545277, 12.11554337, 12.11567497)
data <- data.frame(hhid, longitude, latitude)

dat_gps <- st_as_sf(data, crs = 4326, coords = c("longitude", "latitude"))
dat_proj <- st_transform(dat_gps, crs = 32630)
dat_ppp <- as.ppp(dat_proj)
ids <- marks(dat_ppp)

nd <- nndist(dat_ppp, k = 1:2)
nw <- nnwhich(dat_ppp, k = 1:2)

The “serial number” in spatstat will just be the index of the given point in the order the points are listed. Below are these as well as hhid:

data.frame(round(nd,2), nw, hhid = ids, dist.1.hhid = ids[nw[,1]], dist.2.hhid = ids[nw[,2]])
#>    dist.1 dist.2 which.1 which.2 hhid dist.1.hhid dist.2.hhid
#> 1  187.42 238.78       2       3 2004        2006        2009
#> 2  110.86 170.31       3       4 2006        2009        2012
#> 3   71.61 110.86       4       2 2009        2012        2006
#> 4   71.61  78.58       3       5 2012        2009        2013
#> 5   78.58 114.13       4       3 2013        2012        2009
#> 6   44.81  50.31       7       8 2020        2022        2023
#> 7   44.20  44.81       8       6 2022        2023        2020
#> 8   44.20  50.31       7       6 2023        2022        2020
#> 9  130.53 131.42      11      10 2028        2035        2029
#> 10 131.42 137.85       9      11 2029        2028        2035
#> 11 130.53 137.85       9      10 2035        2028        2029
#> 12 261.03 343.62      11      10 2036        2035        2029
#> 13 325.55 355.20       1       2 2043        2004        2006
#> 14 156.28 354.33      15      18 2046        2047        2063
#> 15 156.28 201.28      14      18 2047        2046        2063
#> 16 250.42 295.03      20      19 2059        2066        2065
#> 17   8.79  71.30      18      19 2062        2063        2065
#> 18   8.79  77.88      17      19 2063        2062        2065
#> 19  56.44  71.30      20      17 2065        2066        2062
#> 20  56.44 127.69      19      17 2066        2065        2062

score 0 · Accepted Answer · answered Jun 28 '23 at 20:40

This seems to be good an extension of the question posed here. Building off of that question's accepted answer to extend to your specific situation examining the closest two neighbors, you could do:

library(sp)
library(rgeos)
# dput structure in question assigned as "df"

spatialDF <- df
coordinates(spatialDF) <- ~longitude + latitude
dists <- gDistance(spatialDF, byid = TRUE)
min.2dists <- apply(dists, 1, function(x) order(x, decreasing = FALSE)[2:3])

# closest
df$hhid1 <- df[min.2dists[1,],"hhid"]
df$dist1 <- apply(dists, 1, function(x) sort(x, decreasing = FALSE)[2])

# second closest
df$hhid2 <- df[min.2dists[2,],"hhid"]
df$dist2 <- apply(dists, 1, function(x) sort(x, decreasing = FALSE)[3])

Output:

#    hhid longitude latitude hhid1        dist1 hhid2        dist2
# 1  2004 -1.478302 12.10552  2006 1.700741e-03  2009 0.0021825687
# 2  2006 -1.477470 12.10701  2009 1.017287e-03  2012 0.0015574895
# 3  2009 -1.476489 12.10674  2012 6.502012e-04  2006 0.0010172867
# 4  2012 -1.476147 12.10618  2009 6.502012e-04  2013 0.0007202307
# 5  2013 -1.475480 12.10645  2012 7.202307e-04  2009 0.0010472445
# ...

R Spatstat: Identify nearest neighbours for further use

3 Answers3