Create neighborhood list of large dataset / fasten up

Question

I want to create a weight matrix based on distance. My code for the moment looks as follows and functions for a smaller sample of the data. However, with the large dataset (569424 individuals in 24077 locations) it doesn't go through. The problem arise at the nb2blocknb fuction. So my question would be: How can I optimize my code for large datasets?

# load all survey data
DHS <- read.csv("Daten/final.csv")
attach(DHS)

# define coordinates matrix
coormat <- cbind(DHS$location, DHS$lon_s, DHS$lat_s)
coorm <- cbind(DHS$lon_s, DHS$lat_s)
colnames(coormat) <- c("location", "lon_s", "lat_s")
coo <- cbind(unique(coormat))
c <-  as.data.frame(coo)
coor <- cbind(c$lon_s, c$lat_s)

# get a list with beneighbored locations thath are inbetween 50 km distance
neighbor <- dnearneigh(coor, d1 = 0, d2 = 50, row.names=c$location,  longlat=TRUE, bound=c("GE", "LE"))

# get neighborhood list on individual level
nb <- nb2blocknb(neighbor, as.character(DHS$location)))

# weight matrix in list format
nbweights.lw <- nb2listw(nb, style="B", zero.policy=TRUE)

Thanks a lot for your help!

Some related Q&A's: [*How to assign several names to lat-lon observations*](https://stackoverflow.com/q/21971814/2204410) and [*Geographic distance between 2 lists of lat/lon coordinates*](https://stackoverflow.com/q/31668163/2204410) — Jaap, Jan 02 '18 at 11:53
Where do the `dnearneigh` and `nb2blocknb` functions come from? Please, also specify the used packages. — Jaap, Jan 02 '18 at 11:55

score 0 · Answer 1 · answered Mar 14 '18 at 08:10

you're trying to make 1.3 e10 distance calculations. The results would be in the GB.

I think you'd want to limit either the maximum distance or the number of nearest neighbors you're looking for. Try nn2 from the RANN package: library('RANN') nearest_neighbours_w_distance<-nn2(coordinatesA, coordinatesB,10)

note that this operation is not symmetric (Switching coordinatesA and coordinatesB gives different results).

Also you would first have to convert your gps coordinates to a coordinate reference system in which you can calculate euclidean distances, for example UTM (code not tested):

   library("sp")
   gps2utm<-function(gps_coordinates_matrix,utmzone){
      spdf<-SpatialPointsDataFrame(gps_coordinates_matrix[,1],gps_coordinates_matrix[,2])     
      proj4string(spdf) <- CRS("+proj=longlat +datum=WGS84")  
      return(spTransform(spdf, CRS(paste0("+proj=utm +zone=",utmzone," ellps=WGS84"))))
    }

Create neighborhood list of large dataset / fasten up

1 Answers1