1

Within trio/anyio, is it possible to pause the tasks until i do specific operation and then continue all of it.

Let's say that i run specific function to obtain a valid cookie and then i start to crawl a website, But after sometimes this cookie got expired and i would need to run the previous function again to obtain a new cookie.

so if spawn 10 tasks under the nursery and during that the cookie got expired while 6 tasks is running! so how i can pause all of them and run this function only one time ?

import trio
import httpx


async def get_cookies(client):
    # Let's say that here i will use a headless browser operation to obtain a valid cookie.
    pass


limiter = trio.CapacityLimiter(20)


async def crawler(client, url, sender):
    async with limiter, sender:
        r = await client.get(url)
        if "something special happen" in r.text:
            pass
            # here i want to say if my cookie got expired,
            # Then i want to run get_cookies() only one time .
        await sender.send(r.text)


async def main():
    async with httpx.AsyncClient() as client, trio.open_nursery() as nurse:
        await get_cookies(client)
        sender, receiver = trio.open_memory_channel(0)
        nurse.start_soon(rec, receiver)
        urls = []
        async with sender:
            for url in urls:
                nurse.start_soon(crawler, client, sender.clone())


async def rec(receiver):
    async with receiver:
        for i in receiver:
            print(i)

if __name__ == "__main__":
    trio.run(main)

oguz ismail
  • 1
  • 16
  • 47
  • 69

1 Answers1

2

You simply wrap get_cookies in an async with some_lock block. In that block, if you already have a cookie (let's say it's a global variable) you return it, otherwise you acquire one and then set the global.

When you notice that the cookie has expired, you delete it (i.e. set the global back to None) and call get_cookies.

In other words, something along these lines:

class CrawlData: 
    def __init__(self, client):
        self.client = client
        self.valid = False
        self.lock = trio.Lock()
        self.limiter = trio.CapacityLimiter(20)
        
    async def get_cookie(self):       
        if self.valid:                                                                             
            return        
                      
        async with self.lock:        
            if self.valid:        
                return     
                
            ... # fetch cookie here, using self.client
            
            self.valid = True
                                   
    async def get(self, url):
        r = await self.client.get(url)
        if check_for_expired_cookie(r):
            await self.get_cookie()       
            r = await self.client.get(url)
            if check_for_expired_cookie(r):
                raise RuntimeError("New cookie doesn't work", r)
        return r
        

async def crawler(data, url, sender):
    async with data.limiter, sender: 
        r = await data.get(url)      
        await sender.send(r.text)
        
        
async def main():
    async with httpx.AsyncClient() as client, trio.open_nursery() as nurse:
        data = CrawlData(client)
        sender, receiver = trio.open_memory_channel(0)
        nurse.start_soon(rec, receiver)
        urls = []
        async with sender:
            for url in urls:
                nurse.start_soon(crawler, client, sender.clone())
        ...
Matthias Urlichs
  • 2,301
  • 19
  • 29
  • Thank you, Can you please elaborate with simple example if you don't mind ? – αԋɱҽԃ αмєяιcαη Nov 15 '22 at 08:26
  • If all tasks run on the same thread and process, it might be simpler to just make `get_cookies` synchronous. – McSinyx Nov 15 '22 at 08:42
  • Don't. The code would no longer be composeable, meaning if you use it within a larger program or as part of a longer pipeline it'd frequently stall for no good reason. Also if you already have an async stack anyway, adding some sync code and copying the cookie between the two is a lot of things but not "simple". – Matthias Urlichs Nov 16 '22 at 10:02