4

I have millions of GPS coordinates and want to quickly add a column of the country of the coordinates.

My current method works but is extremely slow:

library(data.table)

#REPRODUCE DATA
data <- data.table(latitude=sample(seq(47,52,by=0.001), 1000000, replace = TRUE),
                   longitude=sample(seq(8,23,by=0.001), 1000000, replace = TRUE))

#REQUIRED PACKAGES
if (!require("sp")) install.packages("sp")
if (!require("rworldmap")) install.packages("rworldmap")
if (!require("sf")) install.packages("sf")
library(sp)
library(rworldmap)
library(sf)

#CURRENT SLOW FUNCTION
coords2country = function(points,latcol,loncol){  
  countriesSP <- getMap(resolution='low')
  pointsSP <- st_as_sf(points,coords=c(loncol,latcol),crs=4326)
  pointsSP<- as(pointsSP,"Spatial")
  # use 'over' to get indices of the Polygons object containing each point 
  indices = over(pointsSP, countriesSP)
  # return the ADMIN names of each country
  indices$ADMIN  
  #indices$ISO3 # returns the ISO3 code 
  #indices$continent   # returns the continent (6 continent model)
  #indices$REGION   # returns the continent (7 continent model)
}

#SLOW!
> system.time(data[,country:=coords2country(data,"latitude","longitude"),])
   user  system elapsed 
121.293   7.849 130.226 

Is there a faster/better way to do this? Thanks!

Neal Barsch
  • 2,810
  • 2
  • 13
  • 39
  • I think you have seen [this post](https://stackoverflow.com/questions/14334970/convert-latitude-and-longitude-coordinates-to-country-name-in-r). The other option from this post is to use the `geonames` package. I wonder if this works for you. – jazzurro Oct 21 '18 at 00:29
  • 1
    See [this question](https://stackoverflow.com/questions/17301608/find-in-which-country-each-point-coordinate-belongs-to?rq=1). I tested `map.where()`. This function works really fast. – jazzurro Oct 21 '18 at 00:35
  • @jazzurro, thanks for your help! map.where() is about 10x faster. Cheers! – Neal Barsch Oct 21 '18 at 00:45
  • if most of your points are in the US, Canada, Russia, Australia... you can find manually the biggest rectangle that includes only area from the given country, and pre-filter on those, that could speed up things quite a bit unless the proposed packages already do this kind of optimization. – moodymudskipper Oct 21 '18 at 10:58

1 Answers1

7

There are two similar questions. They are in my comments above. The questions are asking how to get country names from coordinates. Here the OP is asking which is a faster way to do the task.

Based on the posts, we have three options.

  1. to use the custom function in this question;
  2. to use the geonames package; or
  3. to use map.where() in the map package.

The second option needs a bit of setup. So I just tested map.where(). The following is the result. As the OP said, this function is working much faster.

library(maps)
set.seed(111)
data <- data.table(latitude=sample(seq(47,52,by=0.001), 1000000, replace = TRUE),
                   longitude=sample(seq(8,23,by=0.001), 1000000, replace = TRUE))

system.time(data[, country := map.where(x = longitude, y = latitude)])

#   user  system elapsed 
#   7.20    0.05    7.29 
Ray
  • 2,008
  • 14
  • 21
jazzurro
  • 23,179
  • 35
  • 66
  • 76
  • 1
    This is really fast, but is there a way to only get the main country instead of for example Australia::Tazmania? – Herman Toothrot Nov 21 '21 at 23:24
  • 1
    @HermanToothrot for the add-on question: you can use `strsplit(country, split = ":")` to generate a list of the `country:region-name` or - alternatively - "replace" all after the special character, i.e. `sub(pattern = ":.*", replacement = "")`. – Ray Jan 27 '22 at 12:58