How to parallel sum a loop using multiprocessing in Python

Question

I am having difficulty understanding how to use Python's multiprocessing module.

I have a sum from 1 to n where n=10^10, which is too large to fit into a list, which seems to be the thrust of many examples online using multiprocessing.

Is there a way to "split up" the range into segments of a certain size and then perform the sum for each segment?

For instance

def sum_nums(low,high):
    result = 0
    for i in range(low,high+1):
        result += i
    return result

And I want to compute sum_nums(1,10**10) by breaking it up into many sum_nums(1,1000) + sum_nums(1001,2000) + sum_nums(2001,3000)... and so on. I know there is a close-form n(n+1)/2 but pretend we don't know that.

Here is what I've tried

import multiprocessing

def sum_nums(low,high):
    result = 0
    for i in range(low,high+1):
        result += i
    return result

if __name__ == "__main__":
    n = 1000 
    procs = 2 

    sizeSegment = n/procs

    jobs = []
    for i in range(0, procs):
        process = multiprocessing.Process(target=sum_nums, args=(i*sizeSegment+1, (i+1)*sizeSegment))
        jobs.append(process)

    for j in jobs:
        j.start()
    for j in jobs:
        j.join()

    #where is the result?

so your question is `how do I split a big list into N smaller lists?` ? — Joran Beasley, Apr 22 '15 at 00:21
@JoranBeasley No. My question is how to use multiprocessing to compute a function over many segments and join-sum the results. I added some code above. — user4817101, Apr 22 '15 at 00:22
Look at the answers in for [this](http://stackoverflow.com/questions/2846653/python-multithreading-for-dummies/29774367#29774367) question - should give you a good idea — letsc, Apr 22 '15 at 00:23
@letsc I have already come across that question and didn't find what I needed — user4817101, Apr 22 '15 at 00:23
you dont return the value you need to feed them into a `multiprocessing.PIPE` that something is listening to (or some other form of message passing(sockets etc)) — Joran Beasley, Apr 22 '15 at 00:26
@JoranBeasley I don't fully understand. What do I need to edit to my code to make it return the sum of each process? — user4817101, Apr 22 '15 at 00:27
you probably want to use a multiprocessing pool - the map option sounds exactly like what you're looking for — Noah, Apr 22 '15 at 00:29
@Noah Is there a good example of using the "map option" for my intended use? — user4817101, Apr 22 '15 at 00:32
@bj0's answer pretty much does it; the `Pool.map()` command seems a little more intuitive to me than using `Pool.apply_async()`, but they're basically doing the same thing — Noah, Apr 23 '15 at 01:22

score 2 · Answer 1 · answered Apr 22 '15 at 00:46

I find the usage of multiprocess.Pool and map() much more simple

Using your code:

from multiprocessing import Pool

def sum_nums(args):
    low = int(args[0])
    high = int(args[1])
    return sum(range(low,high+1))

if __name__ == "__main__":
    n = 1000 
    procs = 2 

    sizeSegment = n/procs

    # Create size segments list
    jobs = []
    for i in range(0, procs):
        jobs.append((i*sizeSegment+1, (i+1)*sizeSegment))

    pool = Pool(procs).map(sum_nums, jobs)
    result = sum(pool)

    >>> print result
    >>> 500500

I'm 5 years late, but this hangs when I use a really big n. Any clue what that is happening? — Norhther, Jan 13 '21 at 17:47

Mike McKerns · Answer 2 · 2015-04-23T09:32:53.013

You can do this sum without multiprocessing at all, and it's probably simpler, if not faster, to just use generators.

# prepare a generator of generators each at 1000 point intervals
>>> xr = (xrange(1000*i+1,i*1000+1001) for i in xrange(10000000))
>>> list(xr)[:3]
[xrange(1, 1001), xrange(1001, 2001), xrange(2001, 3001)]
# sum, using two map functions
>>> xr = (xrange(1000*i+1,i*1000+1001) for i in xrange(10000000))
>>> sum(map(sum, map(lambda x:x, xr)))
50000000005000000000L

However, if you want to use multiprocessing, you can also do this too. I'm using a fork of multiprocessing that is better at serialization (but otherwise, not really different).

>>> xr = (xrange(1000*i+1,i*1000+1001) for i in xrange(10000000))
>>> import pathos
>>> mmap = pathos.multiprocessing.ProcessingPool().map
>>> tmap = pathos.multiprocessing.ThreadingPool().map
>>> sum(tmap(sum, mmap(lambda x:x, xr)))
50000000005000000000L

The version w/o multiprocessing is faster and takes about a minute on my laptop. The multiprocessing version takes a few minutes due to the overhead of spawning multiple python processes.

If you are interested, get pathos here: https://github.com/uqfoundation

bj0 · Accepted Answer · 2015-04-22T16:31:33.413

First, the best way to get around the memory issue is to use an iterator/generator instead of a list:

def sum_nums(low, high):
    result = 0
    for i in xrange(low, high+1):
        result += 1
    return result

in python3, range() produces an iterator, so this is only needed in python2

Now, where multiprocessing comes in is when you want to split up the processing to different processes or CPU cores. If you don't need to control the individual workers than the easiest method is to use a process pool. This will let you map a function to the pool and get the output. You can alternatively use apply_async to apply jobs to the pool one at a time and get a delayed result which you can get with .get():

import multiprocessing
from multiprocessing import Pool
from time import time

def sum_nums(low, high):
    result = 0
    for i in xrange(low, high+1):
        result += i
    return result

# map requires a function to handle a single argument
def sn((low,high)):
    return sum_nums(low, high) 

if __name__ == '__main__': 
    #t = time()
    # takes forever   
    #print sum_nums(1,10**10)
    #print '{} s'.format(time() -t)
    p = Pool(4)

    n = int(1e8)
    r = range(0,10**10+1,n)
    results = []

    # using apply_async
    t = time()
    for arg in zip([x+1 for x in r],r[1:]):
        results.append(p.apply_async(sum_nums, arg))

    # wait for results
    print sum(res.get() for res in results)
    print '{} s'.format(time() -t)

    # using process pool
    t = time()
    print sum(p.map(sn, zip([x+1 for x in r], r[1:])))
    print '{} s'.format(time() -t)

On my machine, just calling sum_nums with 10**10 takes almost 9 minutes, but using a Pool(8) and n=int(1e8) reduces this to just over a minute.

This code just created countless copies of Python on my machine and I had to reboot — user4817101, Apr 22 '15 at 00:57
@user4817101 That's because it's missing the `if __name__ == "__main__":` guard, and you're running on Windows. Add it above `Pool(4)` and indent everything accordingly and it should at least run without killing your machine. — dano, Apr 22 '15 at 13:51
@dano i didn't know windows behaved that way (i'm using linux). Edited the code. — bj0, Apr 22 '15 at 16:31

How to parallel sum a loop using multiprocessing in Python

3 Answers3

Linked