0

I'm using the pandas apply function with a function to call an API and populate a column in a dataframe. The API has a QPS of 500. If I exceed the limit, I will get throttled and eventually blocked. How can I limit the rate/speed at which the apply function populates the dataframe?

Harry M
  • 1,848
  • 3
  • 21
  • 37

1 Answers1

3

You'll have to keep track of how many requests you've made in the last second (or whatever timeframe your rate limit specifies). Once you hit the limit, or just before that, you'll have to wait for a bit.

import time
import pandas
import datetime
import requests

# Generate some test data
df = pandas.DataFrame([{"todo_id": i} for i in range(100)])

# Define variables to keep track of throttling
count = 0
last_reset_time = datetime.datetime.now()
rate_limit = 5  # Max number of requests per second


# Define a function to enrich the data
def enrich(row: pandas.Series) -> pandas.Series:

    # Access the global variables
    global count
    global last_reset_time

    # Wait until we are below the rate limit
    if count >= rate_limit:
        while True:
            now = datetime.datetime.now()
            if (now - last_reset_time) <= datetime.timedelta(seconds=1):
                print("Waiting...")
                time.sleep(0.1)
            else:
                count = 0
                last_reset_time = datetime.datetime.now()
                break

    # Enrich the data
    todo_id = row['todo_id']
    response = requests.get(f"https://jsonplaceholder.typicode.com/todos/{todo_id}")
    data = response.json()
    for key, value in data.items():
        row[key] = value

    # Increment the counter
    count += 1

    # Return the row, which will be put in the dataframe
    return row


# Apply the function
df.apply(enrich, axis=1)
Gijs Wobben
  • 1,974
  • 1
  • 10
  • 13
  • I appreciate this question was asked over 2 years ago! However, is there a way to apply a rate limiter/throttle to an existing function please? `df['embeddings'] = df['Narrative Description'].apply(lambda x: get_embedding(x, engine=f'text-search-{size}-doc-001'))` – Jon Jan 18 '23 at 21:13
  • Sure! Just copy the code I posted and replace the part after `# Enrich the data` with `row["embeddings"] = get_embedding(row["Narrative Description"], engine=f"text-search-{size}-doc-001")`. – Gijs Wobben Jan 19 '23 at 07:54