I would like to find a solution to the same problem presented in this question. I quote the part that interests me:
I am working with the New York City taxi data set. The data set has columns including datetime, pickup lat/lon, dropoff lat/lon etc. Now I want to reverse geocode the lat/lon to find the borough/neighborhood
My dataset has a few million rows so I need a computationally efficient method. I downloaded this file. It contains neighborhood names and their centroids. I would like to use the same method as this answer in this question to find the neighborhood whose centroid was the closest and then classified the data point to that neighborhood.
from math import radians, cos, sin, asin, sqrt
def haversine(lon1, lat1, lon2, lat2):
"""
Calculate the great circle distance between two points
on the earth (specified in decimal degrees)
"""
# convert decimal degrees to radians
lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
# haversine formula
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
c = 2 * asin(sqrt(a))
# Radius of earth in kilometers is 6371
km = 6371* c
return km
The problem is that I would like to use the same method but with R. Alternatively find another equally efficient method.
DATASET -> Download 1,8 Gb