I have to traverse a dictionary with elements and their associated numeric values. For each element, a certain function scoreElement()
is executed to calculate the proper value for each entry key.
The problem takes too long to execute without using parallelism or multithreading, so I tried to use a ThreadPoolExecutor
to parallelize the dictionary traversal, so multiple elements will be evaluated each iteration instead of only one.
def paralelDict(item, d):
return {i: scoreElement(i, d) for i in item}
def updateDict(d, pattern):
d = {k: 0 for (k, v) in d.items() if re.match(pattern, k)}
n = os.cpu_count()*2
chunkSize = math.ceil(len(d) / n)
out = {}
with concurrent.futures.ThreadPoolExecutor(max_workers=n) as executor:
futures = [executor.submit(paralelDict,list(d.keys())[chunkSize*i:chunkSize*(i+1)], d) for i in range(n)]
for future in concurrent.futures.as_completed(futures):
out.update(future.result())
return out
if __name__ == '__main__':
.
.
.
d = updateDict(d, f"^[A-Z][A-Z][A-Z][A-Z][A-Z]$")
.
.
The issue is that after implementing this multithreading solution, the CPU barely reaches 10-20% of usage and only executes the process with one core, not achieving the desired result in a decent time.
How can I make this solution actually work in parallel?