1

I am testing the Python multiprocessing module to read from different buffer offsets from the same file. The offsets are known a priori and are indexed as 'rows'.

The code looks like this:

def get_object(row):
    return file.get(row) #Where get seeks to the offset and returns the data object.

rows = range(len(file)) #This gets the row ids.
pool = mp.Pool()
results = pool.map(get_object, rows)
print results

This returns an overflow error. Plenty of posts on this site about overflow errors, but generally due to range issues or people trying to create gigantic lists. Rows is anywhere from 1024 to 100,000...not too large at all.

I can print the results and see that the error is occurring at the end of the iterable. I believe that this has something to do with how map is joining the list of objects. It should be maintaining the order, so I should not be having any issue there.

The error: OverflowError: Python int too large to convert to C long

Update: Looking at the source the error is being raised at Line 528 (Enthought Python 2.7) of pool.py. This is the get method of the ApplyResult class.

Here is the function and the walkthrough that I believe the code is taking:

def get(self, timeout=None)
    self.wait(timeout)
    if not self._ready:
        raise TimeoutError
    if self._success:
        return self._value
    else:
        raise self._value

This is called by map in the function above. The self._ready check is passing, since I am not seeing a TimeoutError. The self._success check calls the function successful:

def successful(self):
    assert self._ready
    return self._success

So the get function then rechecks whether the jobs have finished. Presumably they have since we just passed that check in the previously called if statement. The return value, which should be an object (not an int), then overflows.

Ideas?

Jzl5325
  • 3,898
  • 8
  • 42
  • 62
  • Note that `[x for x in range(len(file))]` is the same as `range(len(file))` except slower. And that definition of `get_object` cannot work. – Fred Foo May 31 '13 at 15:53
  • @larsmans 'and that definition of get_object cannot work.' - can you expand upon this? Why not? – Jzl5325 May 31 '13 at 16:28
  • It ignores its argument and uses some index `i` instead. That must be a mistake. – Fred Foo Jun 01 '13 at 10:38
  • @larsmans Updated with the syntax correction. – Jzl5325 Jun 04 '13 at 14:04
  • What OS? Are you by any chance a victim of an infinite multiprocessing loop (i.e. lack of `if __name__ == '__main__':` under Windows)? http://stackoverflow.com/questions/11501048/python3-x-multiprocessing-cycling-without-if-name-main – freakish Jun 05 '13 at 15:22
  • @freakish OS X test machine. I am not wrapping in `if __name__` since this code is either OSX or linux bound (so forking isn't an issue). – Jzl5325 Jun 05 '13 at 15:33

0 Answers0