0

I am coming from a C# background and Python's Asyncio library is confusing me.

I have read the following 1 2, yet the use of asyncio remains unclear to me.

I am trying to make a website scraper in python that is asynchronous.

async def requestPage(url):
    request = requests.get(url, headers=headers)
    soup = BeautifulSoup(request.content, 'html.parser')
    return soup


async def main():

    #****** How do I run an async task and store its result to use in another task?
    index_soup = asyncio.ensure_future(requestPage(index_url))
    res = asyncio.gather(index_soup)
    currency_urls = res.select('a[href^="/currencies"]')

    print(currency_urls)


loop = asyncio.get_event_loop()

try:
    
    loop.run_until_complete(main())
finally:
    loop.close() 
M.Nar
  • 512
  • 1
  • 9
  • 24
  • 2
    This is never really going to be asynchronous because requests isn’t asynchronous. You may want to consider using a library like [aiohttp](https://docs.aiohttp.org/en/stable/index.html) instead. – dirn Sep 05 '18 at 02:50

2 Answers2

1

As the requests library is not asynchronous, you can use run_in_executor method, so it won't block the running thread. As the result, you can define requestPage as a regular function and call it in the main function like this:

res = await asyncio.gather(loop.run_in_executor(None, requestPage, url)

The blocking function will run in a separate executor, while the control will be returned to the event loop.

Or you can try to use async HTTP client library, like aiohttp.

-1

Ok, I think I found a basic solution.

async def requestPage(url):
    request = requests.get(url, headers=headers)
    soup = BeautifulSoup(request.content, 'html.parser')
    return soup


async def getValueAsync(func, param):
    # Create new task
    task = asyncio.ensure_future(func(param))
    # Execute task. This returns a list of tasks
    await asyncio.gather(task)
    # Get result from task
    return task.result()

async def main():
    soup = await getValueAsync(requestPage, index_url)
    print(val.encode("utf-8"))


loop = asyncio.get_event_loop()

try:

    loop.run_until_complete(main())
finally:
    loop.close() 

I wrote a wrapper that that allows me to call the function asynchronously and store the result.

M.Nar
  • 512
  • 1
  • 9
  • 24
  • 1
    This code effectively does `await gather(ensure_future(requestPage(url)))`. There is no difference between that and a simple `await requestPage(url)` - `gather` is meant to await *multiple* tasks. Also, you don't need an additional call to `task.result()`, `await` will return the result right away. Finally, to make the code actually asynchronous, you need to use a library like `aiohttp`, not `requests`. A rule of thumb is: if your `async def` doesn't await anything, it is not async and could (as far as behavior is concerned) as well be an ordinary `def`. – user4815162342 Sep 05 '18 at 05:42
  • @user4815162342 thank you for the feedback. I thought I first had to create a task and execute it. Can you explain to me then the use of ensure_future? – M.Nar Sep 05 '18 at 17:25
  • `ensure_future` is useful when you need the actual future object, e.g. so you can call `add_done_callback` or similar. In your case you gain nothing by getting a task object, because you immediately await it. – user4815162342 Sep 05 '18 at 17:28
  • Ahhh. Would a fair comparison be that `ensure_future` is similar to javascript promises? As in, I can call the execution a task and then later decide how to handle the result? – M.Nar Sep 05 '18 at 17:43