0

I have code like this:

from multiprocessing import Pool

def do_stuff(idx):
    for i in items[idx:idx+20]:
         # do stuff with idx

items = # a huge nested list
pool = Pool(5)
pool.map(do_stuff, range(0, len(items), 20))
pool.close()
pool.join()

The issue is that threadpool does not share items but rather does create copy for each thread, which is an issue since list is huge and it hogs memory. Is there a way to implement this in a way that items would be shared? found some examples with global that work in basic thread library but that does not seem to apply for multiprocessing lib.

Thanks!

PapeK24
  • 988
  • 1
  • 10
  • 26

1 Answers1

1

thread and multiprocessing are not at all interchangeable.

thread still uses the Global Interpreter Lock behind the scenes and thus it is much easier to share variables between threads whereas multiprocessing does not use the GIL and thus can run into conflicts much easier.

A better way to do this would be returning result of do_stuff then compiling the results together.

Look at the documentation here: https://docs.python.org/2/library/multiprocessing.html#using-a-pool-of-workers

In your case it looks like you should use it like this:

from multiprocessing import Pool

def do_stuff(idx):
    for i in items[idx:idx+20]:
         # do stuff with idx

items = # a huge nested list
pool = Pool(5)
multiple_results = [pool.apply_async(do_stuff, i) for i in range(0, len(items), 20)]
multiple_results = [res.get(timeout=1) for res in multiple_results]

edit on basis of comment:

from multiprocessing import Pool

def do_stuff(items):
    for i in items:
         # do stuff with idx

items = # a huge nested list
pool = Pool(5)
pool.map(do_stuff, [x for x in items[::20]]) #generating a list of lists of twenty items for each thread to work on
pool.close()
pool.join()
Ryan Schaefer
  • 3,047
  • 1
  • 26
  • 46
  • There is no writing involved to list it is just read, some calculations happen and results go to DB. I just want to speed it up. – PapeK24 Aug 16 '18 at 17:53
  • @PapeK24 oh I understand now, in that case this is an XY problem. You should instead be passing the slice of the main list to each thread in the pool, I will update my answer to reflect that. – Ryan Schaefer Aug 16 '18 at 17:55
  • Yeah, still not a solution. there is a list lookup involved. I just need that list variable to point to a same place in memory somehow. – PapeK24 Aug 16 '18 at 17:57
  • It may be even impossible for threads in threadpool to share memory if they are literally different processes. not sure how it is implemented. – PapeK24 Aug 16 '18 at 17:59
  • @PapeK24 what do you mean by there is a list lookup? – Ryan Schaefer Aug 16 '18 at 17:59
  • well for cycle does not go over list like `for i in items` literally there is a different loop over some db results, and results gets matched against some parts of the list. Just used this for simplicity. The lookup is done over whole list. Really this can be solved only by getting threads to look into same memory when doing operations on said list or else threading is useless for me, and I have to abandon it altogether. – PapeK24 Aug 16 '18 at 18:05
  • 1
    read this https://stackoverflow.com/questions/659865/multiprocessing-sharing-a-large-read-only-object-between-processes @PapeK24 – Ryan Schaefer Aug 16 '18 at 18:14
  • Thanks, but these solution seems too complex. I do not have a need to use multiprocessing exactly. I can use basically anything that can share read-only memory easily, in best scenario without locks. Is there such concurent lib in python 2.7? – PapeK24 Aug 16 '18 at 18:18
  • The only library with true concurrency is multiprocessing @PapeK24 – Ryan Schaefer Aug 16 '18 at 18:21
  • so you are telling me there is no simple way to share read-only memory within threads in python? – PapeK24 Aug 16 '18 at 18:34
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/178160/discussion-between-ryan-schaefer-and-papek24). – Ryan Schaefer Aug 16 '18 at 18:49
  • I solved it by using `multiprocessing.dummy.ThreadPool` instead. It seems to be able to share memory and has same inteface. – PapeK24 Aug 17 '18 at 09:24