I would like to better understand semaphore. More precisely, I would like to know the utility of semaphore in the code below:
import aiohttp
import asyncio
async def fetch(session, url, sema):
async with sema, session.get(url) as response:
return await response.text()
async def main():
urls = [
'http://python.org',
'https://google.com',
'http://yifei.me',
'other urls...'
]
tasks = []
sema = asyncio.BoundedSemaphore(value=100)
async with aiohttp.ClientSession() as session:
for url in urls:
tasks.append(fetch(session, url, sema))
htmls = await asyncio.gather(*tasks)
for html in htmls:
print(html[:100])
if __name__ == '__main__':
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
which I found here: Using Python 3.7+ to make 100k API calls, making 100 in parallel using asyncio
Now, on this link: https://pyquestions.com/aiohttp-set-maximum-number-of-requests-per-second and this post aiohttp.TCPConnector (with limit argument) vs asyncio.Semaphore for limiting the number of concurrent connections they say that the limit of simultaneous connections can be handled by the TCPConnecter, and that the default value is 100 connections.
Hence, in my opinion, the use of semaphore in the code above is useless, as the number of connections is already limited by TCPConnecter. Moreover, according to this post How to asyncio.gather tasks in chunks + use semaphore with TCP connections limit? it will not change the memory usage, as all the coroutines will be allocated in memory even with semaphore. Am I correct or am I missing something? Is there an advantage in using semaphore in the code above?
Moreover, if I had to pass 100k urls in the code above, should I limit the number of workers via a queue or should I just let it be?