26

I have a program where I am currently using a concurrent.futures.ThreadPoolExecutor to run multiple tasks concurrently. These tasks are typically I/O bound, involving access to local databases and remote REST APIs. However, these tasks could themselves be split into subtasks, which would also benefit from concurrency.

What I am hoping is that it is safe to use a concurrent.futures.ThreadPoolExecutor within the tasks. I have coded up a toy example, which seems to work:

import concurrent.futures


def inner(i, j):
    return i, j, i**j


def outer(i):
    with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
        futures = {executor.submit(inner, i, j): j for j in range(5)}
        results = []
        for future in concurrent.futures.as_completed(futures):
            results.append(future.result())
    return results


def main():
    with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
        futures = {executor.submit(outer, i): i for i in range(10)}
        results = []
        for future in concurrent.futures.as_completed(futures):
            results.extend(future.result())
    print(results)


if __name__ == "__main__":
    main()

Although this toy example seems to work, I'd like some confidence that this is intentional. I would hope it is, because otherwise it would not be safe to use the executor to execute arbitrary code, in case it also used concurrent.futures to exploit concurrency.

Bharel
  • 23,672
  • 5
  • 40
  • 80
Andrew McLean
  • 383
  • 4
  • 11
  • Mhhh think you should avoid fork-bomb. Did you take any measure of time spent before and after sub threading ? – cgte Jun 22 '18 at 13:21
  • This answer also proved informative to me https://stackoverflow.com/questions/69736380/using-nested-asyncio-gather-inside-another-asyncio-gather – rtviii May 03 '23 at 23:56

1 Answers1

8

There is absolutely no issue with spawning threads from other threads. Your case is no different.

Sooner or later though, the overhead of spawning threads will be quite high, and spawning more threads will actually cause your software to slow down.

I highly suggest using a library like asyncio which beautifully handles tasks asynchronously. It does so by using one thread with non-blocking io. The results will probably be even faster than with normal threads, as the overhead is much less significant.

If you do not wish to use asyncio, why not create another pool executor inside main, and pass it on to the outer() function? This way, instead of 25 (5x5) threads, you will have a maximum of 10 (2x5) which is much more reasonable?

You cannot pass the same main() executor which calls outer() to outer() as it might cause a deadlock (by each outer() waiting for another outer() to finish before they can schedule inner()).

Bharel
  • 23,672
  • 5
  • 40
  • 80