16

I have a dict to store objects:

jobs = {}
job = Job()
jobs[job.name] = job

now I want to convert it to use manager dict because I want to use multiprocessing and need to share this dict amonst processes

mgr = multiprocessing.Manager()
jobs = mgr.dict()
job = Job()
jobs[job.name] = job

just by converting to use manager.dict() things got extremely slow.

For example, if using native dict, it only took .65 seconds to create 625 objects and store it into the dict.

The very same task now takes 126 seconds!

Any optimization i can do to keep manager.dict() on par with python {}?

ealeon
  • 12,074
  • 24
  • 92
  • 173

2 Answers2

14

The problem is that each insert is quite slow for some reason (117x slower on my machine), but if you update your manager.dict() with a normal dict, it will be a single and fast operation.

jobs = {}
job = Job()
jobs[job.name] = job
# insert other jobs in the normal dictionary

mgr = multiprocessing.Manager()
mgr_jobs = mgr.dict()
mgr_jobs.update(jobs)

Then use the mgr_jobs variable.

Another option is to use the widely adopted multiprocessing.Queue class.

JBernardo
  • 32,262
  • 10
  • 90
  • 115
  • this would solve the delay in the initial creation of jobs. but what if i need to insert/delete many times? any idea how lookup performs for manager.dict() compared to regular dict? – ealeon Feb 12 '16 at 03:03
  • it looks like insertion is terrible for manager.Queue() as well. any idea how lookup and deletion perform for manager.dict() compared to regular dict? – ealeon Feb 12 '16 at 03:10
  • multiprocessing.Queue is much faster! it is still sharable between the processes right? Any idea why would someone use manager.Queue() over multiprocessing.Queue() ? – ealeon Feb 12 '16 at 03:27
  • i wish they have multiprocessing.dict() because i do need to use dict hmmm – ealeon Feb 12 '16 at 03:37
  • I tried Queue too. Still ridiculously slow. Makes me want to go back to C. – sudo Aug 24 '17 at 18:59
4

If you are using mgr.dict() inside a loop in your pool. You can use a local normal dict to store results temporarily and then update your mgr.dict() outside the loop like your_mgr_dict.update(local_dict)