11

I am trying to learn async, and now I am trying to get whois information for a batch of domains. I found this lib aiowhois, but there are only a few strokes of information, not enough for such newbie as I am.

This code works without errors, but I don't know how to print data from parsed whois variable, which is coroutine object.

resolv = aiowhois.Whois(timeout=10)

async def coro(url, sem):
    parsed_whois = await resolv.query(url)

async def main():
    tasks = []
    sem = asyncio.Semaphore(4)

    for url in domains:
        task = asyncio.Task(coro(url, sem))
        tasks.append(task)
    await asyncio.gather(*tasks)

loop = asyncio.get_event_loop()
loop.run_until_complete(main())
Benyamin Jafari
  • 27,880
  • 26
  • 135
  • 150
HoneyBee
  • 111
  • 1
  • 1
  • 4
  • 1
    I think you can avoid using tasks. Just apply `gather` to `coro(url, sem)` directly. You can rename the list of tasks to `coros` if you like – Pynchia Nov 24 '19 at 20:16
  • 1
    What do you use the semaphore for? – Pynchia Nov 24 '19 at 20:22
  • This code made from parts of other programs, i'm still not very clear about everything here =( – HoneyBee Nov 25 '19 at 09:07
  • 2
    Not answering your question but just helping for the future: especially in gTLDs, whois is dying, the new protocol to use is RDAP. Since it is based on HTTPS, any HTTP async library will be able to handle it without problems. Except with very good reasons, new software should be built using RDAP today not whois anymore. Also in both cases the input should be a domain name, not an URL. – Patrick Mevzek Nov 25 '19 at 17:43
  • This information is very useful for me!! I didn't know that, thanks a lot! – HoneyBee Nov 26 '19 at 15:06

2 Answers2

6

You can avoid using tasks. Just apply gather to the coroutine directly. In case you are confused about the difference, this SO QA might help you (especially the second answer).

You can have each coroutine return its result, without resorting to global variables:

async def coro(url):
    return await resolv.query(url)

async def main():
    domains = ...
    ops = [coro(url) for url in domains]
    rets = await asyncio.gather(*ops)
    print(rets)

Please see the official docs to learn more about how to use gather or wait or even more options

Note: if you are using the latest python versions, you can also simplify the loop running with just

asyncio.run(main())

Note 2: I have removed the semaphore from my code, as it's unclear why you need it and where.

Pynchia
  • 10,996
  • 5
  • 34
  • 43
3
all_parsed_whois = []  # make a global

async def coro(url, sem):
    all_parsed_whois.append(await resolv.query(url))

If you want the data as soon as it is available you could task.add_done_callback()

python asyncio add_done_callback with async def

Scott P.
  • 1,054
  • 11
  • 12