0

I have a function with a list of objects, two list of int and an int (an ID) as parameters, which returns a tuple of two list of int. this function works very well but when my list of ID grows, it takes a lot of time. Having already used multiprocessing in other projects, it seemed to me that the situation was appropriate for the use of multiprocessing Pool.

However, I get an error _pickle.PicklingError when launching it.

I have spent the past days looking for alternatives ways of doing this : I discovered pathos ProcessPool that runs forever with no indication of the problem. I have tried ThreadingPool as an accepted answer sugested, but it is obviously not adapted to my issue since it does not use multiple CPUs and doesnt speed up the process.

Here is a sample of my function, it is not a reproductible example since it is specific to my case. But I believe the function is pretty clear : It returns a tuple of two lists, created in a for loop.

def getNormalOnConnectedElements(elem, mapping, idList, node):
    normalZ = []
    eids = []
    for e in mapping[node]:
        if e in idList:
            normalZ.append(elem[e].Normal()[2])
            eids.append(e)
    return normalZ, eids

I tried calling it as I usually do :

with Pool(4) as p:
    # with functools.partial()
    result = p.map(partial(getNormalOnConnectedElements, elemList, mapping, idList), nodeList)
    # or with itertools.repeat()
    result = p.starmap(getNormalOnConnectedElements, zip(repeat(elemList), repeat(mapping), repeat(idList), nodeList))

I made sure the function is defined at the top-level, and the call is within a if __name__ == "__main__": block.

So the question is : What in this function causes pickle to throw _pickle.PicklingError ?

Edit :

  File "<input>", line 1, in <module>
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2021.2.2\plugins\python-ce\helpers\pydev\_pydev_bundle\pydev_umd.py", line 198, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2021.2.2\plugins\python-ce\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "C:/Users/TLEP6OQM/Documents/Anaconda/PLoad tool/model.py", line 209, in <module>
    allVec = p.map(partial(getNormalOnConnectedElements, elem, allElemIds, mapping), myFilter)
  File "C:\ProgramData\Anaconda3\envs\myenv\lib\multiprocessing\pool.py", line 290, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "C:\ProgramData\Anaconda3\envs\myenv\lib\multiprocessing\pool.py", line 683, in get
    raise self._value
  File "C:\ProgramData\Anaconda3\envs\myenv\lib\multiprocessing\pool.py", line 457, in _handle_tasks
    put(task)
  File "C:\ProgramData\Anaconda3\envs\myenv\lib\multiprocessing\connection.py", line 206, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "C:\ProgramData\Anaconda3\envs\myenv\lib\multiprocessing\reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <function getNormalOnConnectedElements at 0x00000257E6785620>: attribute lookup getNormalOnConnectedElements on __main__ failed
Titouan L
  • 1,182
  • 1
  • 8
  • 24
  • 2
    We really need a [MCVE] here; many subtle things can break pickling in the `multiprocessing` case, and you've almost certainly omitted them. For example, is `getNormalOnConnectedElements` defined inside or outside of the `if __name__ == "__main__":`? Do any of the objects involved do internal locking (and therefore contain an unpickleable `threading.Lock`)? etc. Also, side-note, what you have provided clearly wouldn't work anyway (you called `p.startmap`, not `p.starmap`). Providing a traceback that says *what* type it couldn't pickle would also be helpful. – ShadowRanger Mar 23 '22 at 13:15
  • Sorry, `startmap` was a typo, I corrected it. For the function, I especially wrote that it was defined at the top-level, maybe I understood it wrong but I meant that it is defined outside of the if, right bellow the imports. Added the traceback as well. – Titouan L Mar 23 '22 at 13:20
  • 1
    That traceback makes it very clear the issue is `getNormalOnConnectedElements` itself, and it's behaving as if it was *not* defined at the top level of your main script (a common reason for this would be if it was only defined conditionally when `__name__ == '__main__'`, as the forking simulation `multiprocessing` uses imports the script in each child under the name `'__mp_main__'` so script behaviors don't run). Take a careful look at all [the programming guidelines](https://docs.python.org/3/library/multiprocessing.html#all-start-methods); you're likely violating at least one. – ShadowRanger Mar 23 '22 at 13:39
  • Thank you very much. It does indeed behave as if it was not defined, the reason is me having the bad habit of running this particular file with "Run File in Python Console" instead of running the file itself. It worked instantly when running the file the proper way ... – Titouan L Mar 23 '22 at 13:43
  • 1
    Yeah, you *cannot* run `multiprocessing` programs that way in many (most?) IDEs; they get run in weird environments that violate the expectations of `multiprocessing`. e.g. it's not uncommon for the IDE to run its own *actual* main script as a wrapper that invokes yours by lying to it and saying it's still `'__main__'`. Problem is, if you actually look at `sys.modules['__main__']` (which is what `pickle` does to figure out if the function it was passed can be found by qualified name lookup) it find the IDE's wrapper script, not your script, and the wrapper script doesn't define your function. – ShadowRanger Mar 23 '22 at 13:48
  • 1
    A fun way this can occur without IDE involvement is when you try to profile your code; [profiling tools use the same trick to "be main, but pretend what they're profiling is main", and it causes the same problem](https://stackoverflow.com/q/53890693/364696). A similar issue can occur when a type misreports its qualified name; normally, the qualified name is set automatically, but [`namedtuple`s it relies on the user to type the same name twice](https://stackoverflow.com/a/48030625/364696), and [private method name-mangling can break it as well](https://stackoverflow.com/a/57497698/364696). – ShadowRanger Mar 23 '22 at 14:54

1 Answers1

1

If anyone stumble upon this question, the reason this error happened even with a very simplist function is because of the way I was running the python script. As it is well explained in the comments by ShadowRanger, the function needs to be defined at the top level. Within PyCharm, "Run File in Python Console" does not simply run it, but puts a wrapper around.

By running the file the proper way, or calling python myscript.py, theres no raised error.

Titouan L
  • 1,182
  • 1
  • 8
  • 24