11

I have a Pandas dataframe with ~20k rows, and I am trying to geocode by address column into lat/long coordinates.

How do I use time.sleep() or maybe other function to stop OSM Nominatim from Too Many Requests 429 error that I am getting now?

Here's the code I use for this:

from geopy.geocoders import Nominatim
from geopy.distance import vincenty

geolocator = Nominatim()
df['coord'] = df['address'].apply(geolocator.geocode).apply(lambda x: (x.latitude, x.longitude))
df.head()

Thanks in advance!

nyi
  • 3,123
  • 4
  • 22
  • 45
seizethedata
  • 130
  • 1
  • 1
  • 8
  • 2
    pandas `apply` method is making too many requests at once using the `geopy` backbone, you can see here in the documentation you can make at most, 1 request a second, https://operations.osmfoundation.org/policies/nominatim/ – eagle Apr 03 '18 at 23:13

2 Answers2

15

geopy since 1.16.0 includes a RateLimiter class which provides a convenient way to deal with the Too Many Requests 429 error by adding delays between the queries and retrying the failed requests.

from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="specify_your_app_name_here")

from geopy.extra.rate_limiter import RateLimiter
geocode = RateLimiter(geolocator.geocode, min_delay_seconds=1)

df['coord'] = df['address'].apply(geocode).apply(lambda location: (location.latitude, location.longitude))
df.head()

Docs: https://geopy.readthedocs.io/en/1.16.0/#usage-with-pandas

KostyaEsmukov
  • 848
  • 6
  • 11
  • 1
    This solution works for me if I keep the `min_delay_seconds = 2`. I am not using the Nomatim object though. I am using something like this `from geopy.geocoders import GoogleV3 nom = GoogleV3(scheme = "http")` – codestruggle Aug 22 '18 at 08:03
0

I would imagine you use a for loop. Without seeing your data, it would look something like this.

x = df['address'].tolist()
names = []

for item in x:
    d={}
    a = geolocator.geocode(item, exactly_one=True, timeout=60)
    try:
        d["Latitude"] = a.latitude
    except:
        pass
    try:
        d["Longitude"] = a.longitude
    except:
        pass
    time.sleep(2)
    names.append(d)

d

This is how you would implement sleep to wait 2 seconds before running the loop again. Also, in the event that the geolocator cannot find the latitude and longitude, it will pass instead of exiting out of the loop and having you start over.

CandleWax
  • 2,159
  • 2
  • 28
  • 46
  • Apparently I still get the same error, for whatever reason, even with time.sleep(2000) – seizethedata Apr 03 '18 at 23:04
  • Ugh, may be the case :/ Do they block for time spans or permanently? – seizethedata Apr 03 '18 at 23:08
  • 2
    @seizethedata You may want to check https://operations.osmfoundation.org/policies/nominatim/. Nominatim is not intended for bulk geocoding. Small one time bulk geocode requests may be permitted. Definitely limit to 1 request per second. You may want to wait to try again tomorrow. – user1558604 Apr 03 '18 at 23:21