Python large iterations number fail

Question

I wrote simple monte-carlo π calculation program in Python, using multiprocessing module. It works just fine, but when I pass 1E+10 iterations for each worker, some problem occur, and the result is wrong. I cant understand what is the problem, because everything is fine on 1E+9 iterations!

import sys
from multiprocessing import Pool
from random import random


def calculate_pi(iters):
    """ Worker function """

    points = 0  # points inside circle

    for i in iters:
        x = random()
        y = random()

        if x ** 2 + y ** 2 <= 1:
            points += 1

    return points


if __name__ == "__main__":

    if len(sys.argv) != 3:
        print "Usage: python pi.py workers_number iterations_per_worker"
        exit()

    procs = int(sys.argv[1])
    iters = float(sys.argv[2])  # 1E+8 is cool

    p = Pool(processes=procs)

    total = iters * procs
    total_in = 0

    for points in p.map(calculate_pi, [xrange(int(iters))] * procs):
        total_in += points

    print "Total: ", total, "In: ", total_in
    print "Pi: ", 4.0 * total_in / total

@AmirRachum π is like ~0.4 total number of iterations is correct. — sashab, Sep 24 '12 at 17:38
does this happen regardless of what `procs` is? what sort of values are you using for `procs`? — Matti Lyra, Sep 24 '12 at 17:48
Why is iters a float rather than an int? What does it mean to have, say, 10000.0001 iterations? — abarnert, Sep 24 '12 at 17:55
@MattiLyra there is no division by `procs` number. So it should not. I'll test it later. @abarnert for exponent. 1E+NUM is useful. — sashab, Sep 24 '12 at 17:57
could your provide `print sys.argv[1:], xrange(int(float(sys.argv[2])))` for `sys.argv` that produces the wrong result? — jfs, Sep 24 '12 at 17:59
@scrat No I wasn't thinking that, but that you exhaust `random.random()` as that is shared by all processes, although that's fairly unlikely as the period is `2**19937-1`. — Matti Lyra, Sep 24 '12 at 18:02
@MattiLyra: Yes, but if you give it a number that isn't exactly an integer (including if you try to give it an integer that isn't exactly representable as a float), then `total` ends up wrong, and therefore `total_in / total` is also wrong. So, it would be better to do `iters = int(float(sys.argv[2]))` in the first place. That didn't turn out to be the problem here, but it could. — abarnert, Sep 24 '12 at 18:19

abarnert · Accepted Answer · 2012-09-24T18:36:22.047

The problem seems to be that multiprocessing has a limit to the largest int it can pass to subprocesses inside an xrange. Here's a quick test:

import sys
from multiprocessing import Pool
def doit(n):
  print n
if __name__ == "__main__":
  procs = int(sys.argv[1])
  iters = int(float(sys.argv[2]))
  p = Pool(processes=procs)
  for points in p.map(doit, [xrange(int(iters))] * procs):
    pass

Now:

$ ./multitest.py 2 1E8
xrange(100000000)
xrange(100000000)
$ ./multitest.py 2 1E9
xrange(1000000000)
xrange(1000000000)
$ ./multitest.py 2 1E10
xrange(1410065408)
xrange(1410065408)

This is part of a more general problem with multiprocessing: It relies on standard Python pickling, with some minor (and not well documented) extensions to pass values. Whenever things go wrong, the first thing to check is that the values are arriving the way you expected.

In fact, you can see this problem by playing with pickle, without even touching multiprocessing (which isn't always the case, because of those minor extensions, but often is):

>>> pickle.dumps(xrange(int(1E9)))
'c__builtin__\nxrange\np0\n(I0\nI1000000000\nI1\ntp1\nRp2\n.'
>>> pickle.dumps(xrange(int(1E10)))
'c__builtin__\nxrange\np0\n(I0\nI1410065408\nI1\ntp1\nRp2\n.'

Even without learning all the details of the pickle protocol, it should be obvious that the I1000000000 in the first case is 1E9 as an int, while the equivalent chunk of the next case is about 1.41E9, not 1E10, as an int. You can experiment

One obvious solution to try is to pass int(iters) instead of xrange(int(iters)), and let calculate_pi create the xrange from its argument. (Note: In some cases an obvious transformation like this can hurt performance, maybe badly. But in this case, it's probably slightly better if anything—a simpler object to pass, and you're parallelizing the xrange construction—and of course the difference is so tiny it probably won't matter. Just make sure to think before blindly transforming.)

And a quick test shows that this now works:

import sys
from multiprocessing import Pool

def doit(n):
  print xrange(n)

if __name__ == "__main__":
    procs = int(sys.argv[1])
    iters = int(float(sys.argv[2]))
    p = Pool(processes=procs)
    for points in p.map(doit, [iters] * procs):
      pass

Then:

$ ./multitest.py 2 1E10
xrange(10000000000)
xrange(10000000000)

However, you will still run into a larger limit:

$ ./multitest.py 2 1E100
OverflowError: Python int too large to convert to C long

Again, it's the same basic problem. One way to solve that is to pass the arg all the way down as a string, and do the int(float(a)) inside the subprocesses.

As a side note: The reason I'm doing iters = int(float(sys.argv[2])) instead of just iters = float(sys.argv[2]) and then using int(iters) later is to avoid accidentally using the float iters value later on (as the OP's version does, in computing total and therefore total_in / total).

And keep in mind that if you get to big enough numbers, you run into the limits of the C double type: 1E23 is typically 99999999999999991611392, not 100000000000000000000000.

So why does 1000000000 iterations give you the correct PI estimate but 1410065408 iterations doesn't? — Matti Lyra, Sep 24 '12 at 18:11
@MattiLyra: As a first guess, if you're doing 1410065408 iterations but think you're doing 1000000000 you're going to end up dividing wrong at the end. But I haven't tested or really thought it through. — abarnert, Sep 24 '12 at 18:15
In python 2.7.3 I get `>>> pickle.dumps(xrange(int(1E10))) OverflowError: Python int too large to convert to C long` in pyhton 3.2.3 I get `>>> pickle.dumps(range(int(1E10)),protocol=0) b'c__builtin__\nxrange\np0\n(L0L\nL10000000000L\nL1L\ntp1\nRp2\n.'` — Xavier Combelle, Sep 24 '12 at 18:29
@XavierCombelle: It's not just about 2.7.3 vs. 3.2.3; it's also about 32-bit vs. 64-bit, different platforms and compilers, etc. If all numbers in your range happen to work on your implementation, you can ignore the problem (unless you might want a larger range or more portability later); otherwise, you have to deal with it. — abarnert, Sep 24 '12 at 18:34
In python 2.7.3 under Ubunutu 32 bits it doesn't exactly work it fails loud. I wonder exactly which version fail silently. — Xavier Combelle, Sep 24 '12 at 18:40
@XavierCombelle: It fails silently in the stock 64-bit python 2.7.2 on Mountain Lion. My guess is it'll probably do the same on 2.7 on any platform where a C long is 64 bits but a C int is 32 bits (so, e.g., 64-bit Mac or linux, but not Win64, or any 32-bit platform). Of course to be sure I'd have to either test a bunch of platforms or read the source. — abarnert, Sep 24 '12 at 18:52
This issue is tracked into python core development http://bugs.python.org/issue16029 — Xavier Combelle, Sep 25 '12 at 09:37

Python large iterations number fail

1 Answers1

Linked