The pythonic way would probably be to use asyncio. The problem you have is exactly what asyncio
was designed for. The net result is broadly the same as using threading. However, instead of threads though, you have tasks. And when a task is blocked, the executor will switch to a different task. However, the program will be single-threaded and so avoid the overhead caused by the GIL when switching between threads.
import asyncio
async def get_info(machine, user, data):
# NB. async declaration
...
async def main():
tasks = [
asyncio.create_task(get_info(machine, user, data))
for machine, user, data in machine_list
]
done, _pending = await asyncio.wait(tasks)
# asyncio is more powerful in that it allows you to directly get results of tasks.
# This is unlike threading, where you must use some form of signalling
# (such as a queue) to get data back from a thread.
results = {}
for args, task in zip(machine_list, tasks):
result = await task # this gets the result immediately since you have
# already used asyncio.wait
results[args] = result
task: (await task) for task in tasks}
if __name__ == '__main__':
asyncio.run(main())
The problem with this approach is that you'll have to start using asyncio-aware libraries and rewrite your own code to be asyncio-aware. Though to get started you can use asyncio.to_thread()
. It will run the given function in a separate thread,
import asyncio
def get_info(machine, user, data):
# NB. no async declaration
...
async def main():
tasks = [
asyncio.to_thread(get_info, machine, user, data)
for machine, user, data in machine_list
]
done, _pending = asyncio.wait(tasks)
if __name__ == '__main__':
asyncio.run(main())
concurrent.futures
If you're heavily invested in the threading model and switching to asyncio would be too much work, and having to learn all the new concepts would be too much of a barrier, then you can use concurrent.futures
from concurrent.futures import ThreadPoolExecutor
def get_info(machine, user, data):
...
def get_info_helper(args)
machine, user, data = args
return get_info(machine, user, data)
with ThreadPoolExecutor() as executor:
results = list(executor.map(get_info_helper, machine_list))