I am testing the Python multiprocessing module to read from different buffer offsets from the same file. The offsets are known a priori and are indexed as 'rows'.
The code looks like this:
def get_object(row):
return file.get(row) #Where get seeks to the offset and returns the data object.
rows = range(len(file)) #This gets the row ids.
pool = mp.Pool()
results = pool.map(get_object, rows)
print results
This returns an overflow error. Plenty of posts on this site about overflow errors, but generally due to range issues or people trying to create gigantic lists. Rows is anywhere from 1024 to 100,000...not too large at all.
I can print the results and see that the error is occurring at the end of the iterable. I believe that this has something to do with how map is joining the list of objects. It should be maintaining the order, so I should not be having any issue there.
The error: OverflowError: Python int too large to convert to C long
Update: Looking at the source the error is being raised at Line 528 (Enthought Python 2.7) of pool.py. This is the get
method of the ApplyResult
class.
Here is the function and the walkthrough that I believe the code is taking:
def get(self, timeout=None)
self.wait(timeout)
if not self._ready:
raise TimeoutError
if self._success:
return self._value
else:
raise self._value
This is called by map in the function above. The self._ready check is passing, since I am not seeing a TimeoutError. The self._success check calls the function successful:
def successful(self):
assert self._ready
return self._success
So the get function then rechecks whether the jobs have finished. Presumably they have since we just passed that check in the previously called if statement. The return value, which should be an object (not an int), then overflows.
Ideas?