0

I have created a class in which I store several large dictionaries and a method that takes as input a file and process it using the informations contained in the dictionaries.

Now, the total amount of files that I have to process is around 18000 so I opted to use a multiprocessing.dummy.Pool in this way:

with dummy.Pool(processes=50) as pool:
    failed = [x for x in pool.imap(export.get_uniref_uniprotkb_from_panproteome, species, chunksize=300)]

What I noticed in htop is that the main process spawns correctly 50 threads but only one is in running state even I change the chunksize.

Léopold Houdin
  • 1,515
  • 13
  • 18
Francesco
  • 1
  • 1
  • You have to close the pool and then join as well. pool.close(), pool.join(). – Lavanya Pant Oct 06 '18 at 08:59
  • Python threads generally don't run concurrently because of the GIL (Global Interpreter Lock). Because of that, they're of little help in speeding up compute-bound processing. Using "real" multiprocessing might offer some improvement, depending on exactly what needs to be done. – martineau Oct 06 '18 at 08:59
  • 1
    Possible duplicate of [multiprocessing.dummy in Python is not utilising 100% cpu](https://stackoverflow.com/questions/26432411/multiprocessing-dummy-in-python-is-not-utilising-100-cpu) – Matteo Italia Oct 06 '18 at 09:06
  • @LavanyaPant I'm usintg the `with` statement – Francesco Oct 06 '18 at 09:23
  • @Francesco You may not require close but pool.join is require. – Lavanya Pant Oct 07 '18 at 07:11

0 Answers0