2

I have a list of ~300K URLs for an API i need to get data from.

The API limit is 100 calls per second.

I have made a class for the asynchronous but this is working to fast and I am hitting an error on the API.

How do I slow down the asynchronous, so that I can make 100 calls per second?

import grequests

lst = ['url.com','url2.com']

class Test:
    def __init__(self):
        self.urls = lst

    def exception(self, request, exception):
        print ("Problem: {}: {}".format(request.url, exception))

    def async(self):
        return grequests.map((grequests.get(u) for u in self.urls), exception_handler=self.exception, size=5)



    def collate_responses(self, results):
        return [x.text for x in results]

test = Test()
#here we collect the results returned by the async function
results = test.async()
response_text = test.collate_responses(results)
RustyShackleford
  • 3,462
  • 9
  • 40
  • 81
  • Try sleep() but I suggest that you can use a IpProxy module, slow down is not an necessary choice. – Henry Aug 24 '18 at 12:47
  • @Henryyuan thank you for the suggestion. While I read about IpProxy, where should I apply the sleep() in the code? and how long of sleep()? – RustyShackleford Aug 24 '18 at 12:50
  • Send a 100 every second with a separate asynchronous call. You can also limit the max amount of active requests (you add when they are sent and remove when they return so you know how many are active). Those 2 things should sort it out. – E.Serra Aug 24 '18 at 12:51
  • @E.Serra I am not sure how to implement the 100 every second in another asynchronous call. Could you please show me your suggestion in code? – RustyShackleford Aug 24 '18 at 12:54
  • If there is something Just like a `throttle` module which cause API limit call per second, you call use Ipproxy module to send your request by using a proxy to skip this problem.From my understanding, you want to implement a crawler. – Henry Aug 24 '18 at 12:55
  • @Henryyuan no not a crawler, just need to get back data for every ID I pass into an API url. – RustyShackleford Aug 24 '18 at 13:01
  • @RustyShackleford That is a kind of crawler in my understanding. – Henry Aug 24 '18 at 13:04

3 Answers3

2

The first step that I took was to create an object who can distribute a maximum of n coins every t ms.

import time

class CoinsDistribution:
    """Object that distribute a maximum of maxCoins every timeLimit ms"""
    def __init__(self, maxCoins, timeLimit):
        self.maxCoins = maxCoins
        self.timeLimit = timeLimit
        self.coin = maxCoins
        self.time = time.perf_counter()


    def getCoin(self):
        if self.coin <= 0 and not self.restock():
            return False

        self.coin -= 1
        return True

    def restock(self):
        t = time.perf_counter()
        if (t - self.time) * 1000 < self.timeLimit:
            return False
        self.coin = self.maxCoins
        self.time = t
        return True

Now we need a way of forcing function to only get called if they can get a coin. To do that we can write a decorator function that we could use like that:

@limitCalls(callLimit=1, timeLimit=1000)
def uniqFunctionRequestingServer1():
    return 'response from s1'

But sometimes, multiple functions are calling requesting the same server so we would want them to get coins from the the same CoinsDistribution object. Therefor, another use of the decorator would be by supplying the CoinsDistribution object:

server_2_limit = CoinsDistribution(3, 1000)

@limitCalls(server_2_limit)
def sendRequestToServer2():
    return 'it worked !!'

@limitCalls(server_2_limit)
def sendAnOtherRequestToServer2():
    return 'it worked too !!'

We now have to create the decorator, it can take either a CoinsDistribution object or enough data to create a new one.

import functools

def limitCalls(obj=None, *, callLimit=100, timeLimit=1000):
    if obj is None:
        obj = CoinsDistribution(callLimit, timeLimit)

    def limit_decorator(func):
        @functools.wraps(func)
        def limit_wrapper(*args, **kwargs):
            if obj.getCoin():
                return func(*args, **kwargs)
            return 'limit reached, please wait'
        return limit_wrapper
    return limit_decorator

And it's done ! Now you can limit the number of calls any API that you use and you can build a dictionary to keep track of your CoinsDistribution objects if you have to manage a lot of them (to differrent API endpoints or to different APIs).

Note: Here I have choosen to return an error message if there are no coins available. You should adapt this behaviour to your needs.

TitouanT
  • 339
  • 4
  • 5
1

You can just keep track of how much time has passed and decide if you want to do more requests or not.

This will print 100 numbers per second, for example:

from datetime import datetime
import time

start = datetime.now()
time.sleep(1);
counter = 0
while (True):
    end = datetime.now()
    s = (end-start).seconds
    if (counter >= 100):
        if (s <= 1):
            time.sleep(1) # You can keep track of the time and sleep less, actually
            start = datetime.now()
            counter = 0
    print(counter)
    counter += 1
ChatterOne
  • 3,381
  • 1
  • 18
  • 24
  • Thank you for the answer, I am not understanding how using time will allow me to batch the calls I have made. Could you show me with my code how I can adapt to yours. – RustyShackleford Aug 24 '18 at 13:15
-2

This other question in SO shows exactly how to do this. By the way, what you need is usually called throttling.

Pablo M
  • 326
  • 2
  • 7
  • I am not understanding the implementation. Will need to decode it to my code. Thank you for the suggestion. I wish some could give me an easier solution – RustyShackleford Aug 24 '18 at 13:02
  • The solution implemented is stopping at error 429, I dont need that, I only need to batch 100 urls per second. – RustyShackleford Aug 24 '18 at 13:05
  • Well, if you were using requests instead of grequests, it would be as simple as putting a sleep. But I guess you have some other requirement that makes you use grequests. – Pablo M Aug 24 '18 at 13:06
  • Can requests handle making ~250K calls in a relatively short amount of time? Because when I try with requests its taking very long time. – RustyShackleford Aug 24 '18 at 13:07
  • Honestly, I don't know. But if you're going to be forced to slow down to a 100 per second, I guess who's faster doesn't matter so much. It's gonna take, at least, 41 minutes. – Pablo M Aug 24 '18 at 13:09
  • Another issue I run into with Requests is that of time out – RustyShackleford Aug 24 '18 at 13:11