3

I have a data.table that holds ids and locations. for example, here is it with one row in it: (it has col and row names, don't know if it matters)

locations<-data.table(c(11,12),c(-159.58,0.2),c(21.901,22.221))
colnames(locations)<-c("id","location_lon","location_lat")
rownames(locations)<-c("1","2")

I then want to iterate over the rows and compare them to another point (with lat,lon). In a for loop it works:

for (i in 1:nrow(locations)) {
    loc <- locations[i,]
    dist <- gdist(-159.5801, 21.901, loc$location_lon, loc$location_lat, units="m")
    if(dist <= 50) {
        return (loc)
    }
    return (NULL)
}

and returns:

id location_lon location_lat

1: 11 -159.58 21.901

but I want to use apply. The following code fails to run:

dists <- apply(locations,1,function(x) if (50 - gdist(-159.5801, 21.901, x$location_lon, x$location_lat, units="m")>=0) x else NULL)

with $ operator is invalid for atomic vectors error. Changing to reference by location (x[2],x[3]) isn't enough to fix this, I get

Error in if (radius - gdist(lon, lat, x[2], x[3], units = "m") >= 0) x else NULL : 
missing value where TRUE/FALSE needed 

This is because the data.table is converted to matrix, and the coordinates are treated as text instead of numbers. Is there a way to overcome this? The solution needs to be efficient (I want to run this check for >1,000,000 different coordinates). Changing the data structure of the locations table is possible if needed.

Community
  • 1
  • 1
KeshetE
  • 385
  • 2
  • 5
  • 17
  • I added this an example for a working code. In the real loop everything is passed as variables and the rowname is automatically generated. – KeshetE Jan 08 '15 at 09:37
  • Where the `gdist` function comes from? – nicola Jan 08 '15 at 09:38
  • Can you create a data set with more than one row and provide your desired output? I also don't see a reason whatsoever to use `data.table` if all you do are `for` loops and `apply` loops without using any built in `data.table` features – David Arenburg Jan 08 '15 at 09:39
  • `dist` comes from the `Imap` package – KeshetE Jan 08 '15 at 09:39
  • Don't know the package. Is `gdist` vectorized? If so, you don't need any loop. If not, use any of the `dist*` functions from `geosphere`. – nicola Jan 08 '15 at 09:44

1 Answers1

6

No loops are required, just use data.table as intended. If all you want to see are the rows that within 50 meters from the desired location, all you have to do is

locations[, if (gdist(-159.58, 21.901, location_lon, location_lat, units="m") <= 50) .SD, id]
##    id location_lon location_lat
## 1: 11      -159.58       21.901

Here we are iterating by the id column within the locations data set itself and checking if each id is within 50 meters from -159.58, 21.901. If so, we are calling .SD which is basically the data set itself for that specific id.


As a side note, data.table doesn't have row.names, so there is no need of specifiying them, see here, for example

Community
  • 1
  • 1
David Arenburg
  • 91,361
  • 17
  • 137
  • 196