6

I want to calculate the distance between two points in two different datasets. I don't want to calculate the distance between all points - just to the nearest point of datasetB.
Some examples:

Dataset A - Persons
http://pastebin.com/HbaeqACi

Dataset B - Waterfeatures:
http://pastebin.com/UdDvNtHs

Dataset C - City:
http://pastebin.com/nATnkMRk

So...I want to calculate the distance of each person to the nearest waterfeature point.
I've already tried to work with the rgeos package and after struggling with some projections errors, I've got it to work. But this calculate (at least I assume it) all distances to every point, but, as already said, I've only interested in the distance to the nearest waterfeature point.

# load csv files
persons = read.csv("persons.csv", header = TRUE)
water = read.csv("water.csv", header = TRUE)
# change dataframes to SpatialPointDataFrame and assign a projection
library(sp)
library(rgeos)
coordinates(persons) <- c("POINT_X", "POINT_Y")
proj4string(persons) <- CRS("+proj=utm +datum=WGS84")
coordinates(water) <- c("POINT_X", "POINT_Y")
proj4string(water) <- CRS("+proj=utm +datum=WGS84")

# use rgoes package to calculate the distance
distance <- gDistance(persons, water, byid=TRUE)
# works, but calculates a huge number of distances

Is there any parameter, which I've missed. Or do I need to use another package or function? I've also looked at spatstat, which is able to calculate the distance to the nearest neighbor, but not of two different datasets: http://hosho.ees.hokudai.ac.jp/~kubo/Rdoc/library/spatstat/html/nndist.html


Edit:
The complete R-Script including plotting of the datasets:

library(RgoogleMaps)
library(ggplot2)
library(ggmap)
library(sp)
library(fossil)

#load data
persons = read.csv("person.csv", header = TRUE, stringsAsFactors=FALSE)
water = read.csv("water.csv", header =TRUE, stringsAsFactors=FALSE)
city = read.csv("city.csv", header =TRUE)

# plot data
persons_ggplot2 <- persons
city_ggplot2 <- city
water_ggplot2 <- water
gc <- geocode('new york, usa')
center <- as.numeric(gc)  
G <- ggmap(get_googlemap(center = center, color = 'bw', scale = 1, zoom = 11, maptype = "terrain", frame=T), extent="device")
G1 <- G + geom_point(aes(x=POINT_X, y=POINT_Y ),data=city, shape = 22, color="black", fill = "yellow", size = 4) + geom_point(aes(x=POINT_X, y=POINT_Y ),data=persons, shape = 8, color="red", size=2.5) + geom_point(aes(x=POINT_X, y=POINT_Y ),data=water_ggplot2, color="blue", size=1)
plot(G1)

#### calculate distance
# Generate unique coordinates dataframe
UniqueCoordinates <- data.frame(unique(persons[,4:5]))
UniqueCoordinates$Id <- formatC((1:nrow(UniqueCoordinates)), width=3,flag=0)

# Generate a function that looks for the closest waterfeature for each id coordinates
NearestW <- function(id){
tmp <- UniqueCoordinates[UniqueCoordinates$Id==id, 1:2]
WaterFeatures <- rbind(tmp,water[,2:3])
tmp1 <- earth.dist(WaterFeatures, dist=TRUE)[1:(nrow(WaterFeatures)-1)]
tmp1 <- which.min(tmp1)
tmp1 <- water[tmp1,1]
tmp1 <- data.frame(tmp1, WaterFeature=tmp)
return(tmp1)
}

#apply to each id and the merge
CoordinatesWaterFeature <- ldply(UniqueCoordinates$Id, NearestW)
persons <- merge(persons, CoordinatesWaterFeature, by.x=c(4,5), by.y=c(2,3))

enter image description here

David Arenburg
  • 91,361
  • 17
  • 137
  • 196
schlomm
  • 551
  • 2
  • 11
  • 22

2 Answers2

5

What about writing a function that looks for the nearest waterfeature for every person?

#requires function earth.dist from "fossil" package
require(fossil)

#load data
persons = read.csv("person.csv", header = TRUE, stringsAsFactors=FALSE)
water = read.csv("water.csv", header =TRUE, stringsAsFactors=FALSE)

#Generate unique coordinates dataframe
UniqueCoordinates <- data.frame(unique(persons[,4:5]))
UniqueCoordinates$Id <- formatC((1:nrow(UniqueCoordinates)), width=3,flag=0)


#Generate a function that looks for the closest waterfeature for each id coordinates
NearestW <- function(id){
   tmp <- UniqueCoordinates[UniqueCoordinates$Id==id, 1:2]
   WaterFeatures <- rbind(tmp,water[,2:3])
   tmp1 <- earth.dist(WaterFeatures, dist=TRUE)[1:(nrow(WaterFeatures)-1)]
   tmp1 <- min(tmp1)
   tmp1 <- data.frame(tmp1, WaterFeature=tmp)
   return(tmp1)
 }

#apply to each id and the merge
CoordinatesWaterFeature <- ldply(UniqueCoordinates$Id, NearestW)
persons <- merge(persons, CoordinatesWaterFeature, by.x=c(4,5), by.y=c(2,3))

NOTE: I've added a stringsAsFactors parameter to the original read.csv , it make the merging easier at the end

NOTE:Column tmp1 notes the number of METERS to the nearest water feature

eclark
  • 819
  • 7
  • 16
  • Thanks for this tip. Unfortunately I get the following error aftetr executing `persons$nearest <- sapply(persons$Id, NearestW)`: _Error in `$<-.data.frame`(`*tmp*`, "nearest", value = list()) : replacement has 0 rows, data has 164_. I've save it to a new data.frame with `persons_nearest <- sapply(persons$Id, NearestW)`, but the output is a empty list. – schlomm Mar 02 '14 at 00:48
  • What output do you get if you run `NearestW("001")` ? – eclark Mar 02 '14 at 00:51
  • By running `NearestW("001")` I get the following output: `> [1] 0` – schlomm Mar 02 '14 at 00:55
  • I'm not sure what the problem is. I loaded the data from the links you provided and I've gotten it to work in my machine but maybe it imported differently from the one you got. Just to check... after doing the read.csv is persons 5 columns wide and water 3 columns wide? – eclark Mar 02 '14 at 01:00
  • Argh...I was even too bad to paste your code properly into my R-Script (I've missed the line, where a person ID is generated). Everything works now :) Btw: You've said, the computation could be more faster by calculating the neareast water feature for each unique set of coordinates and then merge these. How does this look? Because...I've transferred your approach to my real dataset (about 2500 persons and 40 cities (so these 2500 persons are located over 40 locations) and the calculation needs already 20min with an i5 @ 4*4Ghz oO. – schlomm Mar 02 '14 at 01:36
  • Don't worry. I've edited the code to reflect the unique coordinate thing and then merge with persons.df – eclark Mar 02 '14 at 02:17
  • eclark: You're my hero! The performance difference is incredible! I think, tomorrow there will occur some more questions, so I'm very glad that you've helped me (also thanks for yesterday :)). One questions left: The tmp1 is the distance right? It the distance in meters? – schlomm Mar 02 '14 at 02:39
  • No problem. Glad to help. Yeah tmp1 is the distance and its in kms – eclark Mar 02 '14 at 02:46
  • Thanks. Sorry for this (maybe stupid) question: kms is = meters? – schlomm Mar 02 '14 at 02:54
  • Its kilometers so its 1000 meters – eclark Mar 02 '14 at 03:13
  • Mhh...this is crazy, because no of the persons is more than a few hundreds meters away from a waterfeature point?! I've added the plotting and the plot itself to my first post to visualize the fact. Is there an error in the calculation or did I miss anything else? – schlomm Mar 02 '14 at 03:36
  • Sorry I've miss read, tmp1 is NOT the distance, tmp1 is the id of the nearest water feature as per the first column of the water data.frame. Does that make sense? Give me a second to make tmp1 the distance. I'll edit it in the answer – eclark Mar 02 '14 at 03:42
  • Awesome! Works like a charm. I've checked the distance with ArcGIS' measurement tool and it's the same. Thanks! <3 – schlomm Mar 02 '14 at 11:29
2

Maybe I'm a little too late, but you can use spatstat to compute distances between two different datasets. The command is nncross. The arguments you have to use are two objects of type ppp, which you can create using the as.ppp() function.

Mario Becerra
  • 514
  • 1
  • 6
  • 16
  • Maybe to late for the reason why I've asked the question, but definitely helpful for the future! Thanks! :) – schlomm Nov 22 '15 at 21:30
  • However, if you use spatstat it is **crucial that you project to planar coordinates** first. It does not automagically recognize longitude,latitude data. However, the C code for `nncross` is quite efficient, so you may experience a significant speed gain for large datasets. – Ege Rubak Apr 08 '19 at 22:02