2

I am using NYC trips data. I wanted to convert the lat-long present in the data to respective boroughs in NYC. I especially want to know if there is some NYC airport (Laguardia/JFK) present in one of those trips.

I know that Google Maps API and even libraries like Geopy get the reverse geocoding. However, most of them give city and country level codings.

I wanted to extract the borough or airport (like Queens, Manhattan, JFK, Laguardia etc) name from the lat-long. I have lat-long for both pickup and dropoff locations.

Here is a sample dataset in pandas dataframe.

    VendorID    lpep_pickup_datetime    Lpep_dropoff_datetime   Store_and_fwd_flag  RateCodeID  Pickup_longitude    Pickup_latitude Dropoff_longitude   Dropoff_latitude    Passenger_count Trip_distance   Fare_amount Extra   MTA_tax Tip_amount  Tolls_amount    Ehail_fee   improvement_surcharge   Total_amount    Payment_type    Trip_type
0   2   2015-09-01 00:02:34 2015-09-01 00:02:38 N   5   -73.979485  40.684956   -73.979431  40.685020   1   0.00    7.8 0.0 0.0 1.95    0.0 NaN 0.0 9.75    1   2.0
1   2   2015-09-01 00:04:20 2015-09-01 00:04:24 N   5   -74.010796  40.912216   -74.010780  40.912212   1   0.00    45.0    0.0 0.0 0.00    0.0 NaN 0.0 45.00   1   2.0
2   2   2015-09-01 00:01:50 2015-09-01 00:04:24 N   1   -73.921410  40.766708   -73.914413  40.764687   1   0.59    4.0 0.5 0.5 0.50    0.0 NaN 0.3 5.80    1   1.0
In [5]:

You can find the data here too:

http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml

After bit of research I found I can leverage Google Maps API, to get the county and even establishment level data.

Here is the code I wrote:

A mapper function to get the geocode data from Google API for the lat-long passed

def reverse_geocode(latlng):
    result = {}
    url = 'https://maps.googleapis.com/maps/api/geocode/json?latlng={}'
    request = url.format(latlng)
    data = requests.get(request).json()
    if len(data['results']) > 0:
        result = data['results'][0]
    return result


# Geo_code data for pickup-lat-long
trip_data_sample["est_pickup"] = [y["address_components"][0]["long_name"] for y in map(reverse_geocode, trip_data_sample["lat_long_pickup"].values)]
trip_data_sample["locality_pickup"]=[y["address_components"][2]["long_name"] for y in map(reverse_geocode, trip_data_sample["lat_long_pickup"].values)]

However, I initially had 1.4MM records. It was taking lot of time to get this done. So I reduced to 200K. Even that was taking lot of time to run. So then I reduced to 115K. Even that taking too much time.

So now I reduced to 50K. But then this sample would hardly be having a representative distribution of the whole data.

I was wondering if there is any better and faster way to get the reverse geocode of lat-long. I am not using Spark since I am running it on local mac. So using Spark might not give that much speed leverage on single machine. Pls advise.

Baktaawar
  • 7,086
  • 24
  • 81
  • 149
  • You need to find a shapefile of the counties in New York, and then use a Python GIS library, like, `ogr` or `shapely` to see which (if any) county your lat/lon point intersects. See [this](http://stackoverflow.com/questions/7861196/check-if-a-geopoint-with-latitude-and-longitude-is-within-a-shapefile) question. – john_science Feb 28 '17 at 19:20
  • You are either asking for a library or free coding, neither of which is appropriate for SO. – Mad Physicist Feb 28 '17 at 19:22
  • Both are appropriate for SO. I am asking for help and that does involve people helping with a solution. Pls if you don't have answer then I would appreciate if you don't comment and mark my question as "close" since it is very much valid. – Baktaawar Feb 28 '17 at 19:25
  • @Baktaawar I understand that you consider that this question is valid since you posted it. That does not necessarily mean that the community agrees with you. In the same way that it may not agree with my close vote. – Mad Physicist Feb 28 '17 at 19:28
  • Just as an FYI: Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it. – Mad Physicist Feb 28 '17 at 19:29
  • At the very least, you should have some code using the libraries giving incomplete answers. – Mad Physicist Feb 28 '17 at 19:30
  • @MadPhysicist I did not ask for a library. If you read my question correctly, I did mention few ones I have explored. So I did do some research on this already. Secondly, I asked a help on suggestions on how to approach this. Instead of reading the whole question DO NOT just mark something as not correct. And dont comment asking for free code. I asked suggestion and no where specifically i said free code. Do not show your frustration by saying wrong things – Baktaawar Feb 28 '17 at 19:31
  • @Baktaawar. I believe that I read your entire text. I see no attempt on your part to solve this problem, just a vague request for help, which is why I marked it as too broad. I do not think there is much you can do short of editing your question to convince me that I should remove my close vote. If no one else agrees with me, the close vote will expire in due time and you have nothing to worry about. This is a pretty democratic process and a lone nut like myself can't do too much damage without a general concensus. – Mad Physicist Feb 28 '17 at 19:35
  • US Census provides geocoding tools for the zip code level. Look for info on the MAF/TIGER database. And you can cross reference zip code to counties. Then, as others suggested, use a GIS library to map the answer. – JavoSN Feb 28 '17 at 19:46
  • Thanks JavoSN. I just found using Google Maps API i can get the result as form of dictionary. Then I just need to parse to the right key and get the corresponding locality and name. Looks it would work probably. Tried for couple. I need to know how can I map those co-ordinates on a map to show some data – Baktaawar Feb 28 '17 at 20:01
  • @MadPhysicist I updated my question. I hope it meets your standards now and shows an attempt or approach. – Baktaawar Mar 01 '17 at 07:58
  • @Baktaawar. This is now an excellent question. It has a clear problem statement based on something you are actually trying to achieve. +1 – Mad Physicist Mar 01 '17 at 11:12
  • This service includes the county level. Not sure what it takes to get a key, though. https://maps.alk.com/PCMDoc/ReverseGeocoding – Conner M. Jan 29 '18 at 02:08

0 Answers0