4

Code example:

async def download_page(session, url):
    print(True)


async def downloader_init(session):
    while True:
        url = await download_queue.get()
        task = asyncio.create_task(download_page(session, url))
        print(task)
        print(f"url: {url}")


async def get_urls(url):
    while True:
        try:
            url = find_somewhere_url
            await download_queue.put(url)
        except NoSuchElementException:
            break
    return True


async def main():
    async with aiohttp.ClientSession(headers=headers) as session:
        get_urls_task = asyncio.create_task(get_urls(url))
        downloader_init_task = asyncio.create_task(downloader_init(session))

        asyncio.gather(get_urls_task, downloader_init_task)


if __name__ == "__main__":
    asyncio.get_event_loop().run_until_complete(main())

Output:

<Task pending coro=<download_page() running at main.py:69>>
url: https://someurl.com/example
<Task pending coro=<download_page() running at main.py:69>>
url: https://someurl.com/example
<Task pending coro=<download_page() running at main.py:69>>
url: https://someurl.com/example

Why is the method download_page not executed? The strange thing is that the script just ends its work, there are no errors anywhere. downloader_init should work endlessly, but it does not.

In download_queue, method get_urls adds links as it finds them, after which it stops working. downloader_init should immediately execute as soon as a new link appears in the queue, but it starts its work only when get_urls has completed its work.

kshnkvn
  • 876
  • 2
  • 18
  • 31

1 Answers1

1

Try this instead:

Note: Your problem wasn't with the task creation, it was because there wasn't an await at the asyncio.gather part.

import asyncio
import aiohttp


async def download_page(session, url):
    # Dummy function.
    print(f"session={session}, url={url}")


async def downloader_init(session):
    while True:
        url = await download_queue.get()
        task = asyncio.create_task(download_page(session, url))
        print(f"task={task}, url={url}")


async def get_urls(url):
    while True:
        try:
            url = find_somewhere_url()
            await download_queue.put(url)
        except NoSuchElementException:
            break


async def main():
    async with aiohttp.ClientSession(headers=headers) as session:
        get_urls_task = asyncio.create_task(get_urls(url))
        downloader_init_task = asyncio.create_task(downloader_init(session))

        # Use await here to make it finish the tasks.
        await asyncio.gather(get_urls_task, downloader_init_task)


if __name__ == "__main__":
    # Use this as it deals with the loop creation, shutdown,
    # and other stuff for you.
    asyncio.run(main())  # This is new in Python 3.7
GeeTransit
  • 1,458
  • 9
  • 22
  • But I do not want to await the async tasks. The point of create_task should be that you dont need await them and block execution. See example here, no need await: https://stackoverflow.com/questions/44630676/how-can-i-call-an-async-function-without-await/44630895 – user3761555 Sep 18 '22 at 02:04
  • @user3761555 1. There is no await on the `download_page` task. The await I added is on the `asyncio.gather`. 2. The point of create_task is to have multiple concurrent tasks; awaiting _multiple_ tasks to wait until they all finish running is valid too. The OP was wondering why "the method download_page [was] not executed". Because their main method ended immediately after creating the `get_urls` and `downloader_init` tasks - they created but didn't wait for those tasks to complete - they thought their `download_page` method wasn't run. Your question is different from OP's question. – GeeTransit Sep 18 '22 at 20:32
  • @user3761555 Sorry if you got confused by my comment on the question. The answer I gave was different from what I said up there. I deleted it to prevent further confusion. Original comment for reference: I don't think you understand `create_task`. What it does is basically what it says; it creates a new task that runs the passed coroutine. It does run but the result is ignored. In `downloader_init`, you should use `await` on `download_page(session, url)` so that you get its result. Imma go make a full answer cuz this is gettin a bit too long ._. – GeeTransit Sep 18 '22 at 20:37