I have a very large file that loads in my main process. my goal is to have several processes read from memory at the same time to avoid memory constraints and to make it faster.
according to this answer, I should use Shared ctypes Objects
Manager types are built for flexibility not efficiency ... this necessarily means copying whatever object is in question. .... If you want shared physical memory, I suggest using Shared ctypes Objects. These actually do point to a common location in memory, and therefore are much faster, and resource-light.
so I did this:
import time
import pickle
import multiprocessing
from functools import partial
def foo(_, v):
tp = time.time()
v = v.value
print(hex(id(v)))
print(f'took me {time.time()-tp} in process')
if __name__ == '__main__':
# creates a file which is about 800 MB
with open('foo.pkl', 'wb') as file:
pickle.dump('aaabbbaa'*int(1e8), file, protocol=pickle.HIGHEST_PROTOCOL)
t1 = time.time()
with open('foo.pkl', 'rb') as file:
contract_conversion = pickle.load(file)
print(f'load took {time.time()-t1}')
m = multiprocessing.Manager()
vm = m.Value(str, contract_conversion, lock=False) # not locked because i only read from it so its safe
foo_p = partial(foo, v=vm)
tpo = time.time()
with multiprocessing.Pool() as pool:
pool.map(foo_p, range(4))
print(f'took me {time.time()-tpo} for pool stuff')
however I can see that the processes use a of copy it (the ram in each process is very high) and it's MUCH slower than simply reading from disk.
the print:
load took 0.8662333488464355
0x1c736ca0040
took me 2.286606550216675 in process
0x15cc0404040
took me 3.178203582763672 in process
0x1f30f049040
took me 4.179721355438232 in process
0x21d2c8cc040
took me 4.913192510604858 in process
took me 5.251579999923706 for pool stuff
also the id is not the same, though I am not sure if id is simply a python identifier or the memory location.