3

I am trying to get zipcode for lat-long coordinates in new york region

I tried to use reverse geocoder API from google but its limited to 2500 hits per day so can process my data frame in batch.

Next, I tried using the library(zipcode) with dataset zip code but could not match latitude longitude with the coordinates of train data set as all lat-long coordinates are not in the dataset.

Further though of using KNN to predict zipcode for the dataset but can't get correct results.

zipcode_latlon = zipcode[zipcode$state=="NY",c(1,4,5)]
train_latlon = train_data[,c("latitude","longitude")]
zip1 = rep(10007, nrow(train_latlon))
zip1 = as.character(zip1)
train_latlon = cbind(zip1, train_latlon)
colnames(train_latlon) = c("zip","latitude","longitude")
knn_fit = knn(zipcode_latlon, train_latlon,zipcode_latlon$zip, k=1)

Need to know how I can get zipcodes from lat long in batch, any method would be good in R.

iskandarblue
  • 7,208
  • 15
  • 60
  • 130
  • does that work? – iskandarblue Feb 20 '17 at 15:47
  • Thanks! yes, it worked for me. But have difficulty understanding Shape file format. Could not Understand the attributes in spTransform() – suvir gupta Feb 20 '17 at 16:56
  • The original shapefile is in the coordinate reference system NAD83. spTransform just gives `zips` a different reference system - in this case WGS84. It is important that both the points and polygons have the same CRS. It is also possible to leave `zips` the way it is after importing, and then to give `spdf` the NAD83 CRS with `+proj=longlat +datum=NAD83 +no_defs +ellps=GRS80 +towgs84=0,0,0`. In any case, spatial data is meaningless without a reference system. – iskandarblue Feb 20 '17 at 17:15

1 Answers1

5

I think you are going about this the wrong way. You can find the zip codes of lat/lon coordinates without a geocoder - all you need is to download the US zipcodes shapefile here and then do a spatial join:

library(sp)
library(rgdal)

#import zips shapefile and transform CRS 
zips <- readOGR("cb_2015_us_zcta510_500k.shp")
zips <- spTransform(zips, CRS("+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0"))

#here is a sample with three cities in New York State and their coordinates      
df <- as.data.frame(matrix(nrow = 3, ncol =3))
colnames(df) <- c("lat", "lon", "city")

df$lon <- c(43.0481, 43.1610, 42.8864)
df$lat <- c(-76.1474, -77.6109,-78.8784)
df$city <- c("Syracuse", "Rochester", "Buffalo")

df
       lat     lon      city
1 -76.1474 43.0481  Syracuse
2 -77.6109 43.1610 Rochester
3 -78.8784 42.8864   Buffalo

#extract only the lon/lat                   
xy <- df[,c(1,2)]

#transform coordinates into a SpatialPointsDataFrame
spdf <- SpatialPointsDataFrame(coords = xy, data = df, proj4string = CRS("+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0"))

#subset only the zipcodes in which points are found
zips_subset <- zips[spdf, ]

#NOTE: the column in zips_subset containing zipcodes is ZCTA5CE10
#use over() to overlay points in polygons and then add that to the original dataframe

df$zip <- over(spdf, zips_subset[,"ZCTA5CE10"])

And voila! You have the zipcode of each point

df
       lat     lon      city ZCTA5CE10
1 -76.1474 43.0481  Syracuse     13202
2 -77.6109 43.1610 Rochester     14604
3 -78.8784 42.8864   Buffalo     14202
iskandarblue
  • 7,208
  • 15
  • 60
  • 130
  • Thanks! It works now for me but could not interpret CRS("+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0") Probably I don't understand Spatial data file. How do we look at this? – suvir gupta Feb 20 '17 at 16:43
  • I am able to replicate your example, but when implementing to my project on last step, when using over I get "cannot get a slot ("Polygons") from an object of type "NULL", do you have any sense where my mistake can be? – Agustín Indaco Sep 19 '18 at 19:29
  • Think I found the problem now: you have "lat" and "lon" mixed up in you colnames. And over takes latitudes in the order of lon then lat. So in the example we get the right answer, but in fact you should relabel column names. Does that make sense? – Agustín Indaco Sep 19 '18 at 20:18