15

I'm trying to create a script that send's over 1000 requests to one page at the same time. But requests library with threading (1000) threads. Seems to be doing to first 50 or so requests all within 1 second, whereas the other 9950 are taking considerably longer. I measured it like this.

def print_to_cmd(strinng):
    queueLock.acquire()
    print strinng
    queueLock.release()

    start = time.time()
    resp = requests.get('http://test.net/', headers=header)
    end = time.time()

    print_to_cmd(str(end-start))

I'm thinking requests library is limiting how fast they are getting sent.

Doe's anybody know a way in python to send requests all at the same time? I have a VPS with 200mb upload so that is not the issue its something to do with python or requests library limiting it. They all need to hit the website within 1 second of each other.

Thanks for reading and I hope somebody can help.

john doe
  • 151
  • 1
  • 1
  • 3
  • 1
    Are you trying to overload a site? – Uncle Dino Nov 03 '16 at 00:21
  • 1
    Nobody is going to help you DDOS a website. – tito Nov 03 '16 at 00:23
  • 9
    If I wanted to DDOS a website I would use multiple servers with shells. – john doe Nov 03 '16 at 00:27
  • You might look into [BoundedSemaphore](https://docs.python.org/3.6/library/asyncio-sync.html#boundedsemaphore) (or for [Python 2](https://docs.python.org/2.7/library/threading.html?highlight=semaphore#threading.BoundedSemaphore)). It may be more flexible-- in terms of concurrency throughput-- than a simple lock. – kevin628 Nov 03 '16 at 00:37

3 Answers3

30

I have generally found that the best solution is to use an asynchronous library like tornado. The easiest solution that I found however is to use ThreadPoolExecutor.


import requests
from concurrent.futures import ThreadPoolExecutor

def get_url(url):
    return requests.get(url)
with ThreadPoolExecutor(max_workers=50) as pool:
    print(list(pool.map(get_url,list_of_urls)))
Avi Mosseri
  • 1,258
  • 1
  • 18
  • 35
  • 1
    Sure, you may want to play around with the max_workers parameter to get faster run times – Avi Mosseri Nov 03 '16 at 00:52
  • yes I have noticed 1k requests takes considerably longer. but still, it is better than what I had before – john doe Nov 03 '16 at 00:55
  • 1
    There is typo brah, I did correct it: ```print(list(pool.map(get_url(list_of_urls)))``` – Ender phan Nov 21 '18 at 08:46
  • 3
    @Enderphan Nope, check out how `map` works: https://docs.python.org/3/library/functions.html#map ThreadPoolExecutor().map() is the same idea. – Jacktose Dec 11 '19 at 22:22
  • with post it gives to me error requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')) – Lore Jul 10 '20 at 06:35
  • @Lore you might want to add a try/catch there and maybe some error handling/retries. That means that your http request was unsuccessful. – Avi Mosseri Jul 24 '20 at 22:48
13

I know this is an old question, but you can now do this using asyncio and aiohttp.

import asyncio
import aiohttp
from aiohttp import ClientSession

async def fetch_html(url: str, session: ClientSession, **kwargs) -> str:
    resp = await session.request(method="GET", url=url, **kwargs)
    resp.raise_for_status()
    return await resp.text()

async def make_requests(url: str, **kwargs) -> None:
    async with ClientSession() as session:
        tasks = []
        for i in range(1,1000):
            tasks.append(
                fetch_html(url=url, session=session, **kwargs)
            )
        results = await asyncio.gather(*tasks)
        # do something with results

if __name__ == "__main__":
    asyncio.run(make_requests(url='http://test.net/'))

You can read more about it and see an example here.

Marius Stănescu
  • 3,603
  • 2
  • 35
  • 49
  • `asyncio.run` is a Python 3.7 addition. For previous versions, refer to this [discussion](https://stackoverflow.com/q/52796630/9625777) – Pe Dro Jun 08 '20 at 10:33
1

Assumed that you know what you are doing, I first suggest you to implement a backoff policy with a jitter to prevent "predictable thundering hoardes" to your server. That said, you should consider to do some threading

import threading
class FuncThread(threading.Thread):
    def __init__(self, target, *args):
        self._target = target
        self._args = args
        threading.Thread.__init__(self)

    def run(self):
        self._target(*self._args)

so that you would do something like

t = FuncThread(doApiCall, url)
t.start()

where your method doApiCall is defined like this

def doApiCall(self, url):
loretoparisi
  • 15,724
  • 11
  • 102
  • 146
  • Hi thanks for the fast response. I used threading in the previous attempt but sent the requests using requests library for python. does this request without having to use urllib or requests? – john doe Nov 03 '16 at 00:36