0

I'm trying to find the 5 closest stations from one data set (set1) to another data set (set2). This post is what I'm using as a basis and it seems simple to find the single closest, but I'm writing for loops to deal with it and is not efficient. Furthermore, I'm getting and error and don't understand why it's not working. Ideally, I would like to use set1 to find the closest stations in set2, find the 5 closest stations, and add a column for each station, for each unique id from set1.

Edit: This question is different from How to assign a name to lat-long observations based on shortest distance because I'm trying to find the 5 closest stations, not just a single distance. Also, the method is different for finding the minimum. Please reopen this question.

dput:

set1 <- structure(list(id = c(5984, 7495, 4752, 2654, 4578, 9865, 3265, 
1252, 4679, 1346), lat = c(48.39167, 48.148056, 48.721111, 47.189167, 
47.054443, 47.129166, 47.306667, 47.84, 47.304167, 48.109444), 
    lon = c(13.671114, 12.866947, 15.94223, 11.099736, 12.958342, 
    14.203892, 11.86389, 16.526674, 16.193064, 17.071392)), row.names = c(NA, 
10L), class = "data.frame", .Names = c("id", "lat", "lon"))

set2 <- structure(list(id = 1:10, lat = structure(c(35.8499984741211, 
34.75, 70.9329986572266, 78.25, 69.6829986572266, 74.515998840332, 
70.3659973144531, 67.265998840332, 63.6990013122559, 60.1990013122559
), .Dim = 10L), lon = structure(c(14.4829998016357, 32.4000015258789, 
-8.66600036621094, 15.4670000076294, 18.9160003662109, 19.0160007476807, 
31.0990009307861, 14.3660001754761, 9.59899997711182, 11.0830001831055
), .Dim = 10L)), row.names = c(NA, 10L), class = "data.frame", .Names = c("id", 
"lat", "lon"))

Code:

library(rgeos)
library(sp)


set1sp <- SpatialPoints(set1)
set2sp <- SpatialPoints(set2)
for (i in length(set1$id)){
  for (j in 4:9){
    if(i == 1) {
      sub <- set2
      set1[i,j] <- apply(gDistance(set1sp, set2sp, byid=TRUE), 1, which.min)
      sub <- filter(sub, id != set1[i,j])}
    else{
      set1[i,j] <- apply(gDistance(set1sp, set2sp, byid=TRUE), 1, which.min)
      sub <- filter(sub, id != set1[i,j])}
  }
}

Output error:

 Error in `[<-.data.frame`(`*tmp*`, i, j, value = c(8L, 8L, 8L, 8L, 8L,  : 
  replacement has 10 rows, data has 1 
Community
  • 1
  • 1
Vedda
  • 7,066
  • 6
  • 42
  • 77
  • It's likely that the error is being generated because you are missing a `1:` before `length(set1$id)` – Jared Smith Nov 09 '15 at 00:05
  • What are `set1sp` and `set2sp`? They are not defined. Also, what projection system are your points in? And if you only want to add 5 columns, you probably want `j in 4:8` rather than 9. – Jared Smith Nov 09 '15 at 00:08
  • @JaredSmith Sorry I added set1sp and set2sp – Vedda Nov 09 '15 at 00:37
  • Check your apply lines. [Apply()](http://www.inside-r.org/r-doc/base/apply) returns a vector or array but you set it to a data frame cell value (row,col). – Parfait Nov 09 '15 at 00:39
  • @JaredSmith Thanks, but changed the for loop to include `1:` and it still is not working – Vedda Nov 09 '15 at 01:07
  • @Parfait This apply function works with the same example I posted in the question, so it should work fine. However, I did change it and ams till getting the error – Vedda Nov 09 '15 at 01:08

2 Answers2

1

I had to set the projection system and the coordinates for set1sp and set2sp in order to make gDistance work. I assumed WGS84.

dummyset1= set1
dummyset2= set2
coordinates(set1) = c('lon', 'lat')
coordinates(set2) = c('lon', 'lat')
proj4string(set1) = "+proj=longlat +datum=WGS84"
proj4string(set2) = "+proj=longlat +datum=WGS84"
set1sp = set1
set2sp = set2
set1 = dummyset1
set2 = dummyset2

This loop will return the output you wanted based on using the general structure of your for loop.

for (i in 1:length(set1$id)){
    #Store the projected data in a dummy variable sub
    sub <- set2sp
    for (j in 4:8){
        if (j == 4){
           set1[i,j] <- apply(gDistance(set2sp['id'], set1sp['id'][i,], byid=TRUE), 1, which.min)
           #Remove the index of the closest point from sub.
           sub <- sub[which(sub$id != set1[i,j]), ]
        }
        else {
           #Note that sub is now being checked instead of set2sp. This is because sub has had the index of the closest point removed.
           set1[i,j] <- apply(gDistance(sub['id'], set1sp['id'][i,], byid=TRUE), 1, which.min)
           sub <- sub[which(sub$id != set1[i,j]), ]
        }
    }
}

The resulting output is:

set1
   id      lat      lon V4 V5 V6 V7 V8
1  5984 48.39167 13.67111 10  1  8  7  6
2  7495 48.14806 12.86695 10  1  8  7  6
3  4752 48.72111 15.94223 10  1  8  7  6
4  2654 47.18917 11.09974  1  9  8  7  6
5  4578 47.05444 12.95834  1  9  8  7  6
6  9865 47.12917 14.20389  1  9  8  7  6
7  3265 47.30667 11.86389  1  9  8  7  6
8  1252 47.84000 16.52667  1  9  8  7  6
9  4679 47.30417 16.19306  1  9  8  7  6
10 1346 48.10944 17.07139  1  9  8  7  6
Jared Smith
  • 280
  • 3
  • 12
  • This certainly worked, but I'm getting this error: `50: In RGEOSDistanceFunc(spgeom1, spgeom2, byid, "rgeos_distance") : Spatial object 2 is not projected; GEOS expects planar coordinates` – Vedda Nov 09 '15 at 01:20
  • 1
    I think that warning is a result of using degree coordinates instead of planar coordinates (e.g. UTM coordinates). You can use `spTransform()` to convert your data into a planar system appropriate for your region. – Jared Smith Nov 09 '15 at 02:35
  • 1
    Try `order(apply(spDists(set1,set2),2,min))[1:5]` : `spDists` computes the great circle distances matrix from long/lat data, as defined above; `apply` takes the minimum for every column (point in set2, over set1), order and `[1:5]` finds the five closest. This is not a duplicate question as mentioned above. – Edzer Pebesma Nov 09 '15 at 07:45
  • @EdzerPebesma Thanks, but I tried your code and it didn't work. I get this error: `replacement has 5 rows, data has 1 ` Mind creating another answer using this? – Vedda Nov 09 '15 at 16:40
1

The following computes great circle distances from all points in set 2 wrt set 1. It then takes the minimum over set 1, and orders them; then plots.

library(sp)
coordinates(set1) = c('lon', 'lat')
coordinates(set2) = c('lon', 'lat')
proj4string(set1) = "+proj=longlat +datum=WGS84"
proj4string(set2) = "+proj=longlat +datum=WGS84"
d = apply(spDists(set1,set2),2,min)
order(d)[1:5]
# [1]  1 10  9  2  8
plot(set2, pch=2, axes=TRUE)
points(set1)
o = order(d)[1:5]
points(set2[o,], col = 'red', pch=16)

enter image description here

Edzer Pebesma
  • 3,814
  • 16
  • 26