I am trying to analyse a large shelve with multiprocessing.Pool
. Being with the read-only mode it should be thread safe, but it seams that a large object is being read first, then slowwwwly dispatched through the pool. Can this be done more efficiently?
Here is a minimal example of what I'm doing. Assume that test_file.shelf
already exist and is large (14GB+). I can see this snipped hugging 20GB of RAM, but only a small part of the shelve can be read at the same time (many more items than processors).
from multiprocessing import Pool
import shelve
def func(key_val):
print(key_val)
with shelve.open('test_file.shelf', flag='r') as shelf,\
Pool(4) as pool:
list(pool.imap_unordered(func, iter(shelf.items()))