3

I am working with the New York City taxi data set. The data set has columns including datetime, pickup lat/lon, dropoff lat/lon etc. Now I want to reverse geocode the lat/lon to find the borough/neighborhood. I came across geopy and found that something like this worked perfectly:

from geopy.geocoders import Nominatim
geolocator = Nominatim()

borough = []
loc = ['40.764141, -73.954430', '40.78993085, -73.9496098723']
for l in loc:
    sub = str(geolocator.reverse(l))
    borough.append(sub.split(', ')[2])
borough
## ['Upper East Side', 'East Harlem']

This is perfect, and exactly what I want. However, my dataset has a few million rows, and since this is an online API, it is not feasible. Are there any better ways to accomplish this?

ytk
  • 2,787
  • 4
  • 27
  • 42
  • 2
    If you can get shapefiles of the boroughs (which shouldn't be too hard), you can use various shapefile-manipulation tools to determine if a given point is inside a given borough's shape. – BrenBarn Jan 25 '16 at 02:07
  • That sounds interesting. If you don't mind, can you provide some more details? – ytk Jan 25 '16 at 02:11
  • See for instance [this page](http://streamhacker.com/2010/03/23/python-point-in-polygon-shapely/). If you google for "determine whether point is in shapefile shape" or similar queries you can find lots of things. Since the NYC boroughs are counties, it should be fairly easy to get shapefiles of their boundaries from something like the US Census. – BrenBarn Jan 25 '16 at 02:20
  • Trying to do same process on a huge dataset. Have you been able to solve it? – kthouz Sep 15 '16 at 03:35
  • 1
    I downloaded this file: https://data.cityofnewyork.us/City-Government/Neighborhood-Names-GIS/99bc-9p23. It contains neighborhood names and their centroids. I then used the answer in [this](http://stackoverflow.com/questions/15736995/how-can-i-quickly-estimate-the-distance-between-two-latitude-longitude-points) question to find the neighborhood whose centroid was the closest and then classified the data point to that neighborhood. – ytk Sep 15 '16 at 14:17
  • dearest @ytk , I have your same problem and your own data set. I've been looking for an online solution for days but I haven't found anything (I'm using spark on R). Could you share your solution here (or privately pablopicciau@gmail.com)? I would be extremely grateful – HABLOH Oct 24 '19 at 18:27

2 Answers2

1

You can give Reverse Geocoder a try as I believe it provides the functionality you need.
It takes a latitude / longitude coordinate and returns (offline) the nearest town/city, country, administrative 1 & 2 regions.

Yannis
  • 683
  • 7
  • 16
  • Please take a minute and explain what Reverse Geocoder does as this is a link only answer which is frowned upon. Answers should be able to live without relying on external sources. – DᴀʀᴛʜVᴀᴅᴇʀ Nov 17 '16 at 18:18
  • Reverse Geocoder takes a latitude / longitude coordinate and returns the nearest town/city, country, administrative 1 & 2 regions. – Yannis Nov 17 '16 at 18:40
-1

Check out this answer for a good approach. You might have to define your own polygons for the shapes, though.

Community
  • 1
  • 1
aghast
  • 14,785
  • 3
  • 24
  • 56