8

I've written a script that fetches URLs from a file and sends HTTP requests to all the URLs concurrently. I now want to limit the number of HTTP requests per second and the bandwidth per interface (eth0, eth1, etc.) in a session. Is there any way to achieve this on Python?

Ian Stapleton Cordasco
  • 26,944
  • 4
  • 67
  • 72
Naveen
  • 95
  • 3
  • 7

3 Answers3

3

You could use Semaphore object which is part of the standard Python lib: python doc

Or if you want to work with threads directly, you could use wait([timeout]).

There is no library bundled with Python which can work on the Ethernet or other network interface. The lowest you can go is socket.

Based on your reply, here's my suggestion. Notice the active_count. Use this only to test that your script runs only two threads. Well in this case they will be three because number one is your script then you have two URL requests.

import time
import requests
import threading

# Limit the number of threads.
pool = threading.BoundedSemaphore(2)

def worker(u):
    # Request passed URL.
    r = requests.get(u)
    print r.status_code
    # Release lock for other threads.
    pool.release()
    # Show the number of active threads.
    print threading.active_count()

def req():
    # Get URLs from a text file, remove white space.
    urls = [url.strip() for url in open('urllist.txt')]
    for u in urls:
        # Thread pool.
        # Blocks other threads (more than the set limit).
        pool.acquire(blocking=True)
        # Create a new thread.
        # Pass each URL (i.e. u parameter) to the worker function.
        t = threading.Thread(target=worker, args=(u, ))
        # Start the newly create thread.
        t.start()

req()
Georgi
  • 71
  • 4
  • how do i append it with my script ? I'm a beginner in python. – Naveen Sep 29 '14 at 11:44
  • You would need to post your source (the threading portion) in order for someone to be helpful. As Python says "Semaphores are often used to guard resources with limited capacity". Start with the following and later expand to fit your code. First set a limit = 5. Then you need a pool of thread -> pool = BoundedSemaphore(value=limit). Then lock a thread by pool.acquire(), send http request (e.g. urllib2), and finally unlock the thread by pool.release(). – Georgi Oct 01 '14 at 08:45
  • import threading import time import requests def req(): urls = [url.strip() for url in open('urllist.txt')] for u in range (len(urls)): r = requests.get(urls[u]) print r.status_code,urls[u] threads = [] threads = threading.Thread(target=req) threads.start() – Naveen Oct 01 '14 at 09:27
0

You could use a worker concept like described in the documentation: https://docs.python.org/3.4/library/queue.html

Add a wait() command inside your workers to get them waiting between the requests (in the example from documentation: inside the "while true" after the task_done).

Example: 5 "Worker"-Threads with a waiting time of 1 sec between the requests will do less then 5 fetches per second.

Kuishi
  • 157
  • 4
0

Note the solution below still send the requests serially but limits the TPS (transactions per second)

TLDR; There is a class which keeps a count of the number of calls that can still be made in the current second. It is decremented for every call that is made and refilled every second.

import time
from multiprocessing import Process, Value

# Naive TPS regulation

# This class holds a bucket of tokens which are refilled every second based on the expected TPS
class TPSBucket:

    def __init__(self, expected_tps):
        self.number_of_tokens = Value('i', 0)
        self.expected_tps = expected_tps
        self.bucket_refresh_process = Process(target=self.refill_bucket_per_second) # process to constantly refill the TPS bucket

    def refill_bucket_per_second(self):
        while True:
            print("refill")
            self.refill_bucket()
            time.sleep(1)

    def refill_bucket(self):
        self.number_of_tokens.value = self.expected_tps
        print('bucket count after refill', self.number_of_tokens)

    def start(self):
        self.bucket_refresh_process.start()

    def stop(self):
        self.bucket_refresh_process.kill()

    def get_token(self):
        response = False
        if self.number_of_tokens.value > 0:
            with self.number_of_tokens.get_lock():
                if self.number_of_tokens.value > 0:
                    self.number_of_tokens.value -= 1
                    response = True

        return response

def test():
    tps_bucket = TPSBucket(expected_tps=1) ## Let's say I want to send requests 1 per second
    tps_bucket.start()
    total_number_of_requests = 60 ## Let's say I want to send 60 requests
    request_number = 0
    t0 = time.time()
    while True:
        if tps_bucket.get_token():
            request_number += 1

            print('Request', request_number) ## This is my request

            if request_number == total_number_of_requests:
                break

    print (time.time() - t0, ' time elapsed') ## Some metrics to tell my how long every thing took
    tps_bucket.stop()


if __name__ == "__main__":
    test()
Dharman
  • 30,962
  • 25
  • 85
  • 135
el Punch
  • 210
  • 2
  • 6