For each of the 4 processes of my parallelized big-computation job, I would like to test if some number is member of a big 4GB set S
.
When using this method, the problem is that t = Process(target=somefunc,args=(S,))
passes the 4GB of data to each process, and this is too big for my computer (4*4 = 16 GB)!
How to use S
as a global variable in this multiprocessing job, instead of
having to pass (and duplicate) S
to each process?
from multiprocessing import Process
from random import randint
def somefunc(S):
a = randint(0, 100) # simplified example
print a in S
return
def main():
S = set([1, 2, 7, 19, 13]) # here it's a 4 GB set in my real program
for i in range(4):
t = Process(target=somefunc,args=(S,))
t.start()
t.join()
if __name__ == '__main__':
main()
Note: I've already thought about using a database + client/server (or even just SQlite), but I really want to use the speed of set/dict lookup, which is faster (in order of magnitude) than database call.