1

I have a list of objects, clusters, which I compare against each other using itertools.combinations and map():

likelihoods = map(do_comparison, itertools.combinations(clusters, 2))

To speed this up I use multiple processes instead:

from multiprocessing import Pool
pool = Pool(6)
likelihoods = pool.map_async(do_comparison, itertools.combinations(clusters, 2)).get() 

For small lists this works great. However, with 16700 objects in clusters (139436650 combinations) the pool.map_async() uses huge amounts of memory and my pc quickly runs out of memory, while map() has no memory problems with it.

My pc runs out of memory before the multiple processes are started, so my guess is that it's still dividing the chunks of data over the different processes. So I tried using chunksize=1, so that it only needs a small part of it, but this did now work.

Are there other methods to let map_async() use less memory?

Niek de Klein
  • 8,524
  • 20
  • 72
  • 143

0 Answers0