This question is not new and was discussed multiple times, but I am new to Python.
Geopy too slow - timeout all the time
Timeout error in Python geopy geocoder
I have a dataset of 11000 geolocations and would like to have their zipcodes.
My data looks like:
Longitude Latitude
0 -87.548627 41.728184
1 -87.737227 41.749111
2 -87.743974 41.924143
3 -87.659294 41.869314
4 -87.727808 41.877007
Using this question, I wrote a function, which works for the first 10-20 rows, but gives a timeout error.
# Create a function for zip codes extraction
def get_zipcode(df, geolocator, lat_field, lon_field):
location = geolocator.reverse((df[lat_field], df[lon_field]))
return location.raw['address']['postcode']
geolocator = geopy.Nominatim(user_agent = 'my-application')
# Test a sample with 20 rows
test = bus_stops_geo.head(20)
# Extract zip codes for the sample
zipcodes = test.apply(get_zipcode, axis = 1, geolocator = geolocator,
lat_field = 'Latitude', lon_field = 'Longitude')
print(zipcodes)
0 60617
1 60652
2 60639
3 60607
4 60644
5 60659
6 60620
7 60626
8 60610
9 60660
10 60625
11 60645
12 60628
13 60620
14 60629
15 60628
16 60644
17 60638
18 60657
19 60631
dtype: object
I tried to change the timeout time, but failed so far.
My questions:
- How to achieve this for 11000 rows?
- How to rewrite this function and return not only zips, but initial long and lat too?
- Any simple alternative solutions in programming languages like R or using proprietary software (paid options work for me)?
Tremendously appreciate any help!