1

I have the following code.

from itertools import repeat
from multiprocessing import Pool
import os
import numpy as np

a=[1,2,3]
b=[2,3,4]

def multi_run_wrapper(args):
    return add(*args)
def add(x,y):
    return x+y


if __name__ == "__main__":
    pool = Pool( os.cpu_count())
    results = pool.starmap(add,zip(a,b))
    print(sum(results))

the output would be results= [3, 5, 7]. and print out 15. So in this way, if I want to calculate sum of results, I need to save the whole list results. Is there a way of using multiprocessing without saving the whole list? For example, sum the results while using mulprocessing. Because if my list a and b become super long, then the output results would take too much memory and it won't fit in my laptop. In other words, my goal is to get sum(results) without saving the whole list but speed up the process at the same time.

Thank you

G-09
  • 345
  • 2
  • 13
  • Feels like Map/Reduce (https://pymotw.com/2/multiprocessing/mapreduce.html) would do what you want. It's built for large data sets. – saquintes Jun 10 '21 at 10:08
  • "Is this question fundamentally asking the same thing as that other question?" : NO. https://meta.stackoverflow.com/a/408268/7317733 Please make sure before closing the question as duplicate. – AvidJoe Jun 10 '21 at 12:37
  • @AvidJoe Yes, it is. The linked question is about an iterative starmap, or in other words a starmap "without saving the whole list". Whether the iterative starmap is used to sum or to display a progress is inconsequential. FWIW, a progress bar *does* an internal sum over the element count. Please make sure to provide a reason *why* a duplicate is not appropriate when challenging it. – MisterMiyagi Jun 10 '21 at 12:39
  • @MisteMiyagi https://meta.stackoverflow.com/a/292372/7317733 . I am new here and do not know what to make of these posts but I feel this link explains what I want to say. I guess. So if the op just reads the linked question he is supposed to go to tqdm and understand this " a progress bar does an internal sum over the element count." to even begin to understand what's going on? – AvidJoe Jun 10 '21 at 14:22
  • @AvidJoe They are supposed to see that the other question comes down to an iterative starmap, which is what they asked about. As a simile like in your second link, they asked "how to implement the ``*`` in ``2*13``" and the other question asked "how to implement the ``*`` in ``4.5*27.3``". – MisterMiyagi Jun 10 '21 at 16:11
  • @MisterMiyagi Thanks for the comment. I think i understand now. – AvidJoe Jun 10 '21 at 16:14
  • @Gina09 You can find the duplicate question in the header above this question. The header should also give you a link detailing why we do this and how you can challenge it if you disagree. The direct link to the appropriate help center page is [this](https://stackoverflow.com/help/duplicates). – MisterMiyagi Jun 10 '21 at 16:19
  • @Gina09 The TLDR is that you can add the ``istarmap`` method as shown in the other question, then use ``results = pool.istarmap(add,zip(a,b))`` instead of ``results = pool.starmap(add,zip(a,b))``. – MisterMiyagi Jun 10 '21 at 16:22
  • I don't get it why it can get sum(results) using your solution. – G-09 Jun 10 '21 at 16:30
  • The goal is to get sum(results). and I didn't see an answer in the link. – G-09 Jun 10 '21 at 16:40
  • @Gina09 Because the ``pool.istarmap`` (from the other question) is to ``pool.starmap`` what ``pool.imap`` is to ``pool.map``: it does the same operation as its regular variant, but instead of storing all results into a list it yields each result as soon as it becomes available. ``sum`` can work with both iterables (what ``imap``/``istarmap`` provides) and lists (what ``map``/``starmap`` provides). – MisterMiyagi Jun 10 '21 at 16:40

1 Answers1

1

When processes are run in a self contained environment you will definitely have less things to worry about :) . If you need to do it, you could use the Queue object to modify a single variable this way to achieve Inter process communication.

from itertools import repeat
from multiprocessing import Process, Queue
import os
import numpy as np

a=[1,2,3]
b=[2,3,4]

def multi_run_wrapper(args):
    return add(*args)
def add(queue,x,y):
    queue.put(queue.get(block=True)+x+y)  

if __name__ == "__main__":
    queue = Queue()
    queue.put(0)
    processes = []
        processes = [Process(target=add,args=(queue,x,y)) for x,y in zip(a,b)]
    for each in processes:
      each.start()
    for each in processes:
      each.join()
    print(queue.get(block=True)) # prints 15
    queue.close()
    queue.join_thread()
  • If you want to do it with Pool
from itertools import repeat
from multiprocessing import Pool, Queue
import os
import numpy as np

a=[1,2,3]
b=[2,3,4]

def multi_run_wrapper(args):
    return add(*args)
def add(x,y):
    queue.put(queue.get(block=True)+x+y)   

if __name__ == "__main__":
    queue = Queue()
    queue.put(0)
    pool = Pool( os.cpu_count())
    pool.starmap(add,zip(a,b))
    print(queue.get(block=True))
    queue.close()
    queue.join_thread()
  • Here you aren't storing the entire list of outputs from each process rather you are communicating to each process where to modify.
  • came up with a quick solution, haven't tested it though.
AvidJoe
  • 596
  • 5
  • 19