I have some string processing job in Python. And I wish to speed up the job by using a thread pool. The string processing job has no dependency to each other. The result will be stored into a mongodb database.
I wrote my code as follow:
thread_pool_size = multiprocessing.cpu_count()
pool = ThreadPool(thread_pool_size)
for single_string in string_list:
pool.apply_async(_process, [single_string ])
pool.close()
pool.join()
def _process(s):
# Do staff, pure python string manipulation.
# Save the output to a database (pyMongo).
I try to run the code in a Linux machine with 8 CPU cores. And it turns out that the maximum CPU usage can only be around 130% (read from top), when I run the job for a few minutes.
Is my approach correct to use a thread pool? Is there any better way to do so?