Python ThreadPoolExecutor not using all CPU cores

Question

I have to traverse a dictionary with elements and their associated numeric values. For each element, a certain function scoreElement() is executed to calculate the proper value for each entry key.

The problem takes too long to execute without using parallelism or multithreading, so I tried to use a ThreadPoolExecutor to parallelize the dictionary traversal, so multiple elements will be evaluated each iteration instead of only one.

def paralelDict(item, d):
    return {i: scoreElement(i, d) for i in item}


def updateDict(d, pattern):
    d = {k: 0 for (k, v) in d.items() if re.match(pattern, k)}

    n = os.cpu_count()*2
    chunkSize = math.ceil(len(d) / n)
    out = {}
    with concurrent.futures.ThreadPoolExecutor(max_workers=n) as executor:
        futures = [executor.submit(paralelDict,list(d.keys())[chunkSize*i:chunkSize*(i+1)], d) for i in range(n)]

    for future in concurrent.futures.as_completed(futures):
        out.update(future.result())

    return out
if __name__ == '__main__':
  .
  .
  .
  d = updateDict(d, f"^[A-Z][A-Z][A-Z][A-Z][A-Z]$")
  .
  .

The issue is that after implementing this multithreading solution, the CPU barely reaches 10-20% of usage and only executes the process with one core, not achieving the desired result in a decent time.

How can I make this solution actually work in parallel?

Python threads are still subject to GIL and make more sense for I/O bound things rather than CPU-bound things. You probably want [multiprocessing](https://docs.python.org/3/library/multiprocessing.html). — Jared Smith, Nov 20 '22 at 18:20
@jared-smith I tried multiprocessing, but unfortunately it says `paralelDict` function is not Picklable — Cardstdani, Nov 20 '22 at 18:21
[https://stackoverflow.com/a/8805244/2823755](https://stackoverflow.com/a/8805244/2823755) — wwii, Nov 20 '22 at 19:31
@wwii It basically has to traverse the original hash map and use regex per each element — Cardstdani, Nov 20 '22 at 20:39
Why not just include those instructions in `scoreElement` and write it out as a *regular* for loop that constructs the return dict - maybe it will then be pickle-able - dispense with the one-liner - and try again with ProcessPoolExecutor. — wwii, Nov 20 '22 at 21:29
@wwii Somehow I just managed it to work in Google Colab, but I still have to check if it's really improving program performance — Cardstdani, Nov 21 '22 at 13:58

Python ThreadPoolExecutor not using all CPU cores

0 Answers0