0

I have a data frame with locations for example like this:

id       lat         lon
 a  51.50549  -0.0924609
 b  37.80248 -122.416634
 c  51.50609  -0.1238904

and so on.

My goal is to make a loop so when one chooses location "a" there will be a subset which will include only locations in the radius of 500km from the "a" and exclude everything which is further than this. I think it should be a loop because there will be a constant change of a picked location so the final subset will be unique for each chosen location.

alistaire
  • 42,459
  • 4
  • 77
  • 117
  • What do the locations and final result look like? You need [to make your question reproducible.](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example#5963610) That said, `dist`—or better `geosphere::distHaversine`—is useful. – alistaire May 22 '17 at 18:05

3 Answers3

1

As suggested you could use geosphere library. I asume your data is a data.table:

library(geosphere)
library(data.table)

# asumming your dataframe is named "d"

d <- data.table(d)

# CALCULATE DISTANCE (dist) TO EACH ID (dist_to)

 for (i in 1:nrow(d)) {
      print (d[i,]$id)
      for (j in 1:nrow(d)) {

         d1 <- d[id == d[i,]$id, dist:=distm(c(d[i,]$lon, d[i,]$lat), c(d[j,]$lon, d[j,]$lat), fun = distHaversine)/1000, ]
         d1 <- d1[, dist_to:= d[j,]$id,]
        if(exists('d2')){ d2<-rbindlist(list(d2,d1))} else {d2<-copy(d1)}
  }
}


head(d) 
    id   lat          lon         dist      dist_to
 1:  a 51.50549   -0.0924609    0.000000       a
 2:  b 37.80248 -122.4166340 8623.657407       a
 3:  c 51.50609   -0.1238904    0.000000       a
 4:  a 51.50549   -0.0924609 8625.195873       b
 5:  b 37.80248 -122.4166340 8623.657407       b
 6:  c 51.50609   -0.1238904    0.000000       b

# SELECT DISTANCES LESS THAN 500kms

  d[dist <= 500,]

   id      lat        lon     dist     dist_to
  1:  a 51.50549 -0.0924609 2.178749       c
  2:  c 51.50609 -0.1238904 0.000000       c

Hope it helps.

COLO
  • 1,024
  • 16
  • 28
  • Hey, i tried but i get the error: Item 1 of list input is not a data.frame, data.table or list I also tried just to copy and run your code with the df you build and still got the same error – Elvira Yushkova May 24 '17 at 12:47
0

You can use the sp package to calculate distance from degree coordinates:

library(sp)

# reproducing your example dataset
id<-c("a","b","c")
lat<-c(51.50549,37.80248,51.50609)
lon<-c(-0.0924609, -122.416634, -0.1238904)
d<-data.frame(id,lat,lon)
d
  id      lat          lon
1  a 51.50549   -0.0924609
2  b 37.80248 -122.4166340
3  c 51.50609   -0.1238904

# x and y coordinates must be in first two columns
d2<-as.matrix(d[,-1])

# spDistsN1 function from sp package calculates distance 
# specify longlat=TRUE if you use degrees, gives result in km
s<-nrow(d)
km2<-lapply(1:s,function(i) d[which(spDistsN1(d2,d2[i,],longlat=TRUE)<=500),])

# the result is in a list, each element corresponding to each record
km2
[[1]]
  id      lat       long
1  a 51.50549 -0.0924609
3  c 51.50609 -0.1238904

[[2]]
  id      lat      long
2  b 37.80248 -122.4166

[[3]]
  id      lat       long
1  a 51.50549 -0.0924609
3  c 51.50609 -0.1238904
mshea
  • 41
  • 5
0

Using a nested loop seems a bit too much IMHO.

I would propose something around these lines, using only the functionality from the raster package:

#load package
library(raster)

# create dataframe
df <- data.frame(id=c('a','b','c'), lat=c(51.50549,37.80248,51.50609), lon= c(-0.0924609,-122.416634,-0.1238904))

Now, since these are point locations with Latitude and Longitude, I'm creating a projection string with a geographic coordinate system:

pj <- CRS('+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0')

I think the cleanest way would be to define a little function which does exactly what you want (I call it getPoints):

getPoints <- function(df,id){

  refpoint <- SpatialPoints(df[df$id==id,3:2],proj4string=pj)

  pdist <- pointDistance(refpoint,SpatialPoints(df[,3:2],proj4string=pj),lonlat = T)

  return(df[pdist < 5000,])
}

To calculate the distance between points, I'm using the pointDistance from the raster package. Before/while calculating the distance, I need to convert the coordinates into spatial points with SpatialPoints and the defined projection pj. I'm selecting the reference point refpoint with the idsupplied to the function and calculate the distance to all points in the dataframe df. I'm also calculating the distance to the point itself, so I can index all points with the condition pdist <= 500000 and it will also return the selected point (since the distance is 0).

On a sidenote, I could have avoided assigning the refpoint to a variable, but it's a bit clearer like this.

Finally you could either loop over your point locations

for (id_sel in df$id){

print(getPoints(df,id_sel))

}

or use lapply, which will conveniently save my results to a list and avoid looping in general:

lapply(df$id,function(x) getPoints(df,x))
Val
  • 6,585
  • 5
  • 22
  • 52