-1

I am working with a three variable dataset of over 270,000 observations. The three variables are the observation value, latitude and longitude. In a previous, related, post I managed to get help on how to allow the reverse geocode function to skip observations with missing values for latitude and longitude: How to handle missing values when using a reverse geocode function?

Reproducible example:

Data <- data.frame(
  Observation = 1:5,
  Longitude = c(116.3880005, 53, -97, NA, NA), 
  Latitude = c(39.92890167, 32, 32, NA, NA))

The following code works. However, it produces the factor index for each country instead of the ISO3 I wish to obtain.

library(sp)
library(rworldmap)

coords2country = function(points)
{  
  countriesSP <- getMap(resolution='low')
  #countriesSP <- getMap(resolution='high') #you could use high res map from rworldxtra if you were concerned about detail
  # new changes in worldmap means you have to use this new CRS (bogdan):
  pointsSP = SpatialPoints(points, proj4string=CRS(" +proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs +towgs84=0,0,0"))

  # use 'over' to get indices of the Polygons object containing each point 
  indices = over(pointsSP, countriesSP)

  # return the ADMIN names of each country
  indices$ADMIN  
  indices$ISO3 #would return the ISO3 code
}

#The function below fixes the NA error
coords2country_NAsafe <- function(points)
{
  bad <- with(points, is.na(lon) | is.na(lat))
  result <- character(length(bad))
  result[!bad] <- coords2country(points[!bad,])
  result
}   

The following produces the factor index (not the ISO3 code):

coords2country_NAsafe(points)

I would like to know I could modify the code I have above in order to output the ISO3 codes, and not their factor indexes.

Community
  • 1
  • 1
ealfons1
  • 353
  • 1
  • 6
  • 24
  • How are you reading in that points file and passing it to the function? Why not simply include the first 10 points here and show us what you get, and what country you expect to get? Are you sure you haven't just mixed lat and long up? – Spacedman Aug 21 '13 at 12:29
  • "over 27,000" is an understatement. There are 257,388 rows in that data file. – Spacedman Aug 21 '13 at 12:40
  • Hello, my apologies for the typo on the number of observations. I am positive that I have not mixed up the lat and long. What I am trying to accomplish is to insert a new column into the csv file containing the appropriate ISO3 codes for each observation. – ealfons1 Aug 21 '13 at 13:37
  • with an NA for missing data? You REALLY need to show us a few records from points, and what you get, and how they are wrong. – Spacedman Aug 21 '13 at 13:40
  • I now provided an example of how erroneous the output I get, in terms of the ISO3 values produced, in the first row of the csv file. – ealfons1 Aug 21 '13 at 14:00
  • That doesn't really help improve the question. Show, by editing your question, what your input data - or a subset of it - looks like. Then show how you call the functions you defined, and what the output comes out as, and how it is wrong. People shouldn't have to download a 7Mb CSV file to help you. – Spacedman Aug 21 '13 at 15:39

2 Answers2

2

This all looks good to me:

> pts=read.table("points.csv",head=TRUE,sep=",")
> pts=subset(pts,!is.na(lon)) # just take the good ones to save NA faffing
> coordinates(pts)=~lon+lat
> first = pts[1:100,]         # take first 100 for starters
> cc = coords2country(first)
> plot(first,pch=".")
> text(coordinates(first),label=cc)

enter image description here

All the countries in the right place...

Spacedman
  • 92,590
  • 12
  • 140
  • 224
  • Thank you for providing the plot code. However, my end goal is to create a new column in my csv file, containing the ISO3 codes for each observation. – ealfons1 Aug 21 '13 at 13:38
2

I think your principal problem was that you were outputting the factor index for each ISO3 code instead of the ISO3 codes themselves. Thus you had 42 for China because China is the 42nd country in the map. The as.character() below sorts that.

So making minor edits to your & Barry's code gives the code below which I think does what you want.

Replace 'first' with 'pts' in the final 4 lines to run for your whole dataset.

coords2country = function(points)
{  
  library(rworldmap)
  countriesSP <- getMap(resolution='low')

  #I assume that points have same projection as the map
  pointsSP = SpatialPoints(points, proj4string=CRS(proj4string(countriesSP)))  

  # use 'over' to get indices of the Polygons object containing each point 
  indices = over(pointsSP, countriesSP)

  #as.character(indices$ADMIN) # return the ADMIN names of each country  
  as.character(indices$ISO3) # return the ISO3 code
  #as.character(indices$ISO_N3) # return the ISO numeric code
}


library(sp)
pts=read.table("points.csv",head=TRUE,sep=",")
pts=subset(pts,!is.na(lon)) # just take the good ones to save NA faffing
coordinates(pts)=~lon+lat
first = pts[1:100,]         # take first 100 for starters
cc = coords2country(first)
plot(first,pch=".")
text(coordinates(first),label=cc)

firstPlusISO3 <- cbind(coordinates(first),cc)  
Andy
  • 1,821
  • 13
  • 23