3

As far as we know, it is bad if we start too many threads, and it may significantly decrease performance and increase memory usage. However, I can't find anywhere if the situation is the same if we call too many async functions.

As far as I know, asyncio is a kind of abstraction for parallel computing, and it may use or may not use actual threading.

In my project, multiple asynchronous tasks are run, and each such task (currently, it is done using threading) may start other threads. It is a risky situation. I'm thinking of two ways how to solve the issue with too many threads. The first one is to limit the number of 'software' threads to the number of 'hardware' threads. Another one is to use asyncio. Is the second option reasonable in such a case?

Artyom Vancyan
  • 5,029
  • 3
  • 12
  • 34
chm
  • 359
  • 2
  • 11

1 Answers1

2

As far as I know, asyncio is a kind of abstraction for parallel computing and it may use or may not use actual threading.

Please do not confuse parallelism with asynchronous. In Python, you can achieve parallelism only using multiprocessing.

In my project, multiple asynchronous tasks are run, and each such task may start other threads.

All asynchronous tasks are run in one event loop and use only one thread.

I'm thinking of two ways how to solve the issue with too many threads. The first one is to limit the number of 'software' threads to the number of 'hardware' threads. Another one is to use asyncio. Is the second option reasonable in such a case?

In this answer I have demonstrated situations where we can use async functions. It mainly depends on the operations you do. If your application works with threading and does not need multiprocessing, it can be converted to asynchronous tasks.

Artyom Vancyan
  • 5,029
  • 3
  • 12
  • 34
  • 1
    Thank you. Actually my project performs a heavy calculations in these threads. As far as I understand now, async is not suitable in such case. Likely I'll just change a multithreading implementation to use the maximum number of hardware threads. Likely multiprocessing implementation is not really suitable too because different threads may use the same data objects (sharing objects between processes is more difficult). – chm Aug 22 '22 at 12:42
  • UPD: According to your comment (and other answers I find online), it seems that multithreading in Python will use one core anyway... – chm Aug 22 '22 at 12:49
  • 1
    asyncio and threading are best when working with I/O. If you're performing CPU-intensive tasks, multiprocessing is a better choice. Take a look at concurrent.futures.ProcessPoolExecutor. – dirn Aug 22 '22 at 14:10
  • 1
    Yes, that's right. `ProcessPoolExecutor` is `multiprocessing` based feature that allows to run CPU-bound functions parallelly and get their results at once. See [`map`](https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.Executor.map). – Artyom Vancyan Aug 22 '22 at 14:24
  • As far as I understand, if I use multiprocessing module I can use shared memory with at least one copying of object (for example, using Ray lib). It is not really suitable in the case of big DataFrames. I tried alternative interpreter nogil-3.9.10 and it works finely with pandas and other libraries I need. I would prefer to use the alternative interpreter than multiprocessing because of copying issues. – chm Aug 23 '22 at 09:51
  • Yes, it uses the copies of objects for each process. if you're working with Pandas, then you must separate the DataFrame and process each part of that in a separate process. So that you will get a speed of execution and the objects will not need to access others. – Artyom Vancyan Aug 23 '22 at 10:10