0

I have a master thread which communicates with a large number of worker threads. The worker threads continually send the master thread large dictionaries. These dictionaries are guaranteed to have non-overlapping keys. I would like to create a single, updatable view of these dicts which I can run a small number of queries on.

Existing solutions on this site only consider the case where we start off with all of the dicts that we want to merge (in which case, a Chainmap would be appropriate.)

Currently, I use Multiprocessing.Queue() threads for communication and combine these queues manually into a single dictionary as follows:

store = {}
Q = <List of queues>
while True:
  for q in Q:
    store |= q.get()  # Only dictionaries received from queues

  keys = <generate a few keys of interest>
  if key in store:
    print(f"{key} -> {store[key]})

I have noticed that my code speeds up dramatically (Python 3.9) if instead of merging the dictionaries, I append them to a list and then iterate over that list:

store = []
Q = <List of queues>
while True:
  for q in Q:
    store.append(q.get())

  keys = <generate a few keys of interest>
  for element in store:
    if key in element:
      print(f"{key} -> {store[key]})

My best guess for why this happens is that merging two dicts of size m and n is an O(min(m,n)) operation (I can't find a description of exactly how dicts are merged in the documentation). Given the size of my dicts, its faster to just append them to a list.

Is there a more pythonic way of accomplishing this than the above? The data structure collections.ChainMap is nearly what I want, but there doesn't appear to be a way to add dictionaries to a ChainMap (new_child produces a new chain map rather than mutating the current chain map.)

0 Answers0