4

I'm implementing a multiprocessing program in python, and for each of the subprocess, they all need to read part of a file.

Since reading the file from the disk is expensive, I want to read it only once and put in shared memory.

1. If I use mmap, it can work with fork, but I can't find a way to share the mmaped file between Processes in the multiprocessing module.

2. If I read in the file into a str, and store the string in sharedctypes.RawArray('c', str), an error can occur if the there is a \0 in the str, the RawArray generated is a truncate of the file.

Any idea?

Min Lin
  • 3,177
  • 2
  • 19
  • 32

1 Answers1

0

Could you use the multiprocessing Managers? Make the mmped file an attribute of the NameSpace Object returned by the Namespace() function and pass a reference of this to each of the processes.

from multiprocessing import Manager

mgr = Manager()
ns = mgr.Namespace()
ns.df = my_dataframe

# now just give your processes access to ns, i.e. most simply
# p = Process(target=worker, args=(ns, work_unit))

(my answer is basically copied from here)

Harry de winton
  • 969
  • 15
  • 23