1

I'm trying to send HTTPS requests as quickly as possible. I know this would have to be concurrent requests due to my goal being 150 to 500+ requests a second. I've searched everywhere, but get no Python 3.11+ answer or one that doesn't give me errors. I'm trying to avoid AIOHTTP as the rigmarole of setting it up was a pain, which didn't even work.

The input should be an array or URLs and the output an array of the html string.

Surgemus
  • 70
  • 8
  • Unrelated: I've tried the identical thing in PHP, using Multi-CURL, to some success. I was able to average 50/sec. However, as time went on the speed would exponentially slow down. After 30 minutes, it would go from 50/sec to <0.1/sec. This python script will be running for literal weeks, as well. – Surgemus Nov 25 '22 at 00:02
  • 1
    What's the RTT between your host and the requested host? Take a look at `concurrent.futures.ThreadPoolExecutor` and `concurrent.futures.ProcessPoolExecutor`. They're easy to use and a good place to start with concurrency. Prefer threads over processes since this is an I/O-bound task but be aware that you'll probably need multiple processes running multiple threads to hit your throughput target. – Michael Ruth Nov 25 '22 at 00:22
  • 1
    Take a look at this [answer](https://stackoverflow.com/questions/40222719/python-performance-best-parallelism-approach/66300611#66300611) which achieved this 750 packets/sec. It's packets and sockets rather than https, but it may help you come up with a solution. – Michael Ruth Nov 25 '22 at 00:24
  • @MichaelRuth Thank you, I looked into `ThreadPoolExecutor` and that has seemed to work. [See my response.](https://stackoverflow.com/a/74588957/12950945) my target site is getting about 250/sec averaging 30Mbps. Is there a wat to fix the bottleneck via code so it may potentially go up to 500Mbps? – Surgemus Nov 27 '22 at 10:45
  • 1
    Profile the code, see where the bottleneck is. After that, try adding processes. For a dedicated host running only this application, cpu cores - 1 processes is a good place to start. Each process should use your `ThreadPoolExecutor` code. If you have 500/30 + 1 = 17.667 ~ 18 cores, and your network can handle the load, you could get close to 500Mbps. This is all back of the envelope calculations though, and not many folks have 18 cores to work with. Your best bet is to move this app into a cloud provider that can scale. – Michael Ruth Nov 27 '22 at 20:39
  • @MichaelRuth Thank you for the insight. I'll look into this for sure. – Surgemus Nov 28 '22 at 00:28

3 Answers3

1

It's quite unfortunate that you couldn't setup AIOHTTP properly because this is one of the most efficient way to do asynchronous requests in Python.

Setup is not that hard:

import asyncio
import aiohttp
from time import perf_counter


def urls(n_reqs: int):
    for _ in range(n_reqs):
        yield "https://python.org"

async def get(session: aiohttp.ClientSession, url: str):
    async with session.get(url) as response:
        _ = await response.text()
             
async def main(n_reqs: int):
    async with aiohttp.ClientSession() as session:
        await asyncio.gather(
            *[get(session, url) for url in urls(n_reqs)]
        )


if __name__ == "__main__":
    n_reqs = 10_000
    
    start = perf_counter()
    asyncio.run(main(n_reqs))
    end = perf_counter()
    
    print(f"{n_reqs / (end - start)} req/s")

You basically need to create a single ClientSession which you then reuse to send the get requests. The requests are made concurrently with to asyncio.gather(). You could also use the newer asyncio.TaskGroup:

async def main(n_reqs: int):
    async with aiohttp.ClientSession() as session:
        async with asyncio.TaskGroup() as group:
            for url in urls(n_reqs):
                group.create_task(get(session, url))

This easily achieves 500+ requests per seconds on my 7+ years old bi-core computer. Contrary to what other answers suggested, this solution does not require to spawn thousands of threads, which are expensive.

You may improve the speed even more my using a custom connector in order to allow more concurrent connections (default is 100) in a single session:

async def main(n_reqs: int):
    let connector = aiohttp.TCPConnector(limit=0)
    async with aiohttp.ClientSession(connector=connector) as session:
        ...

Louis Lac
  • 5,298
  • 1
  • 21
  • 36
  • I'm flabbergasted. That worked amazingly. In my earlier testing, I would have to download all the things mentioned in [this video](https://www.youtube.com/watch?v=hgNxAxyncdc) just to get aiohttp or asyncio to work. But you code just worked instantly! Thank you!! – Surgemus Dec 03 '22 at 21:19
0

Hope this helps, this question asked What is the fastest way to send 10000 http requests

I observed 15000 requests in 10s, using wireshark to trap on localhost and saved packets to CSV, only counted packets that had GET in them.

FILE: a.py

from treq import get
from twisted.internet import reactor

def done(response):
   if response.code == 200:
       get("http://localhost:3000").addCallback(done)

get("http://localhost:3000").addCallback(done)

reactor.callLater(10, reactor.stop)
reactor.run()

Run test like this:

pip3 install treq
python3 a.py  # code from above

Setup test website like this, mine was on port 3000

mkdir myapp
cd myapp
npm init
npm install express
node app.js

FILE: app.js

const express = require('express')
const app = express()
const port = 3000

app.get('/', (req, res) => {
  res.send('Hello World!')
})

app.listen(port, () => {
  console.log(`Example app listening on port ${port}`)
})

OUTPUT

grep GET wireshark.csv  | head
"5","0.000418","::1","::1","HTTP","139","GET / HTTP/1.1 "
"13","0.002334","::1","::1","HTTP","139","GET / HTTP/1.1 "
"17","0.003236","::1","::1","HTTP","139","GET / HTTP/1.1 "
"21","0.004018","::1","::1","HTTP","139","GET / HTTP/1.1 "
"25","0.004803","::1","::1","HTTP","139","GET / HTTP/1.1 "

grep GET wireshark.csv  | tail
"62145","9.994184","::1","::1","HTTP","139","GET / HTTP/1.1 "
"62149","9.995102","::1","::1","HTTP","139","GET / HTTP/1.1 "
"62153","9.995860","::1","::1","HTTP","139","GET / HTTP/1.1 "
"62157","9.996616","::1","::1","HTTP","139","GET / HTTP/1.1 "
"62161","9.997307","::1","::1","HTTP","139","GET / HTTP/1.1 "

atl
  • 575
  • 3
  • 6
  • When I try to install treq, everything goes fine until I get this install error: Building wheel for twisted-iocpsupport (pyproject.toml) ... error error: subprocess-exited-with-error Any way I can fix this? – Surgemus Nov 25 '22 at 02:10
  • 1
    Maybe an answer is [installation failed building wheel for twisted in windows 10 python 3](https://stackoverflow.com/questions/51483792/failed-building-wheel-for-twisted-in-windows-10-python-3) , otherwise you can try the [twisted community links](https://docs.twisted.org/en/latest/community.html). Unfortunately, I don't use Windows so limited in how I can help. – atl Nov 25 '22 at 17:40
0

This works, getting around 250+ requests a second. This solution does work on Windows 10. You may have to pip install for concurrent and requests.

import time
import requests
import concurrent.futures

start = int(time.time()) # get time before the requests are sent

urls = [] # input URLs/IPs array
responses = [] # output content of each request as string in an array

# create an list of 5000 sites to test with
for y in range(5000):urls.append("https://example.com")

def send(url):responses.append(requests.get(url).content)

with concurrent.futures.ThreadPoolExecutor(max_workers=10000) as executor:
    futures = []
    for url in urls:futures.append(executor.submit(send, url))
        
end = int(time.time()) # get time after stuff finishes
print(str(round(len(urls)/(end - start),0))+"/sec") # get average requests per second

Output: 286.0/sec

Note: If your code requires something extremely time dependent, replace the middle part with this:

with concurrent.futures.ThreadPoolExecutor(max_workers=10000) as executor:
    futures = []
    for url in urls:
        futures.append(executor.submit(send, url))
    for future in concurrent.futures.as_completed(futures):
        responses.append(future.result())

This is a modified version of what this site showed in an example.

The secret sauce is the max_workers=10000. Otherwise, it would average about 80/sec. Although, when setting it to beyond 1000, there wasn't any boost in speed.

Surgemus
  • 70
  • 8