0

I have two lists, lzma2_list and rar_list. both have a random number of objects names that vary daily. there is a directory where these objects are, called "O:", there are 2 methods that should handle this data.

bkp.zipto_rar(path,object_name)
bkp.zipto_lzma(path,object_name)

how could i get all items from lists asynchronously without waiting for one to finish?

speed up compression using list asynchronously and threads i tried using the answers to this question but in my case the methods receive 2 parameters, one fixed, referring to the directory, and another that will change constantly, referring to the items in the list.


jhown
  • 1
  • 1
  • What `bkp.zipto_rar` and `bkp.zipto_lzma` do? Is it IO or CPU bound work? – ruohola Dec 07 '22 at 16:07
  • I expect these operations to be CPU-bound. Therefore multiprocessing will be your best option – DarkKnight Dec 07 '22 at 16:14
  • Is the problem that you want to use the map function but you need to pass two arguments instead of one? Then the solution is to use functools.partial to create a new function with the first parameter fixed. – Paul Cornelius Dec 08 '22 at 09:45

1 Answers1

0

As your functions take parameters, you should use functools.partial to convert them to the signature without arguments.

Then you can use asyncio.new_event_loop().run_in_executor to process each item in background threads if the functions are IO-bound, or multiprocessing.Pool to process items in background processes if they are CPU-bound.

You can even combine two approaches and use many theads in each background process but it's hard write useful example not knowing specifics or your functions and lists. Gathering results after may also be not trivial.

import asyncio
import functools

lzma2_list = []
rar_list = []


def process_lzma2_list():
    path = 'CONST'
    for item in lzma2_list:
        func = functools.partial(bkp.zipto_lzma, *(path, item))
        asyncio.new_event_loop().run_in_executor(executor=None, func=func)


def process_rar_list():
    path = 'CONST'
    for item in rar_list:
        func = functools.partial(bkp.zipto_rar, *(path, item))
        asyncio.new_event_loop().run_in_executor(executor=None, func=func)
        

if __name__ == '__main__':
    # it's ok to run these 2 functions sequentially as they just create tasks, actual processing is done in background
    process_lzma2_list()
    process_rar_list()
  • You create a new event loop for each item, which means that there is no asyncio-based multitasking going on. You are using asyncio only as an inefficient way to start threads. You would get the same result by replacing `asyncio.new_event_loop().run_in_executor(...)` with `threading.Thread(target=func).start()`. You can pass arguments to a thread without functools.partial. – Paul Cornelius Dec 08 '22 at 09:40
  • I found that creating a new thread for each sync task is much faster than using one thread for many sync tasks(using asyncio.get_event_loop). Like orders of magnitude faster on simple time.wait() function. And we can't use actual cooperative multitasking anyway as functions in question are not using await inside them. – Jaŭhieni Harochaŭ Dec 08 '22 at 16:17
  • I am sure you are right but that was not my point. Your program runs and produces the desired behavior, but is very inefficient. You do not need to use asyncio at all in order to create threads - just use the methods in the threading module. – Paul Cornelius Dec 08 '22 at 21:38