1

I would like to find a solution to the same problem presented in this question. I quote the part that interests me:

I am working with the New York City taxi data set. The data set has columns including datetime, pickup lat/lon, dropoff lat/lon etc. Now I want to reverse geocode the lat/lon to find the borough/neighborhood

My dataset has a few million rows so I need a computationally efficient method. I downloaded this file. It contains neighborhood names and their centroids. I would like to use the same method as this answer in this question to find the neighborhood whose centroid was the closest and then classified the data point to that neighborhood.

from math import radians, cos, sin, asin, sqrt
def haversine(lon1, lat1, lon2, lat2):
    """
    Calculate the great circle distance between two points 
    on the earth (specified in decimal degrees)
    """
    # convert decimal degrees to radians 
    lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
    # haversine formula 
    dlon = lon2 - lon1 
    dlat = lat2 - lat1 
    a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
    c = 2 * asin(sqrt(a)) 
    # Radius of earth in kilometers is 6371
    km = 6371* c
    return km

The problem is that I would like to use the same method but with R. Alternatively find another equally efficient method.

DATASET -> Download 1,8 Gb

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
HABLOH
  • 460
  • 2
  • 12
  • 3
    A couple of suggestions, first there is the geosphere package useful for performing the distance calculations. As far a computationally efficient method, then consider applying a divide and conquer strategy. There is no reason to compute the distances for taxis on Staten Island to neighborhoods in Brooklyn and Queens. – Dave2e Oct 18 '19 at 19:48
  • 1
    Maybe this is just for your own learning, but I thought the NYC taxi data has the borough level pickup/dropoff location for each taxi ride? – creutzml Oct 18 '19 at 19:57
  • 1
    Do you know https://gis.stackexchange.com/help/on-topic? – jay.sf Oct 18 '19 at 21:21
  • 1
    It would be helpful if you create a much smaller subset of the data, so folks don't need to download 1.8 gigs worth – camille Oct 19 '19 at 03:06

0 Answers0