This is part 2 of a problem I am trying to solve to geocode a DataFrame of addresses while staying within the thresholds to the API providers. Yesterday I got help with the requests per second limitation (Is there a slower or more controlled alternative to .apply()?), but now I need to solve for the daily limit. My DataFrame has roughly 25K rows, and the daily limit is 2,500, so I need to split it approximately 10 times. Since I consume a certain amount of daily requests with debugging and development, I think it's safe to split into chunks of 2K. Here is what I have so far:
def f(x, delay=5):
# Function for .apply() with speed limiter
sleep(delay)
return geolocator.geocode(x)
for g, df in df_clean.groupby(np.arange(len(df_clean)) // 2000):
df['coord'] = df['Address'].apply(f).apply(lambda x: (x.latitude, x.longitude))
sleep(untilNextDay)
So what I don't know how to do, is stitch those chunked dataframes back together. I could write them out to csv I guess, but I'm sure there has to be a better way.