101

I have a method inside a class that needs to do a lot of work in a loop, and I would like to spread the work over all of my cores.

I wrote the following code, which works if I use normal map(), but with pool.map() returns an error.

import multiprocessing
pool = multiprocessing.Pool(multiprocessing.cpu_count() - 1)

class OtherClass:
  def run(sentence, graph):
    return False

class SomeClass:
  def __init__(self):
    self.sentences = [["Some string"]]
    self.graphs = ["string"]

  def some_method(self):
      other = OtherClass()

      def single(params):
          sentences, graph = params
          return [other.run(sentence, graph) for sentence in sentences]

      return list(pool.map(single, zip(self.sentences, self.graphs)))


SomeClass().some_method()

Error 1:

AttributeError: Can't pickle local object 'SomeClass.some_method..single'

Why can't it pickle single()? I even tried to move single() to the global module scope (not inside the class - makes it independent of the context):

import multiprocessing
pool = multiprocessing.Pool(multiprocessing.cpu_count() - 1)

class OtherClass:
  def run(sentence, graph):
    return False


def single(params):
    other = OtherClass()
    sentences, graph = params
    return [other.run(sentence, graph) for sentence in sentences]

class SomeClass:
  def __init__(self):
    self.sentences = [["Some string"]]
    self.graphs = ["string"]

  def some_method(self):
      return list(pool.map(single, zip(self.sentences, self.graphs)))


SomeClass().some_method()

and I get the following ...

Error 2:

AttributeError: Can't get attribute 'single' on module 'main' from '.../test.py'

Darkonaut
  • 20,186
  • 7
  • 54
  • 65
Amit
  • 5,924
  • 7
  • 46
  • 94
  • 2
    Anyway, for your original code: pickling local functions usually doesn't work, although the details are complicated—and, to make it even more fun to debug, if any of the captured variables' values can't be pickled, you get an error message that refers to the function instead of that value. – abarnert Sep 10 '18 at 21:21
  • 2
    The solution is to make it a method or a global function, and pass in `delex` as an argument (which you can `functools.partial`) instead of capturing the value. Your modified version should have worked fine; the question is why it's looking in `data.SomeClass.reader`, which doesn't seem like a module at all, instead of in the module, which is presumably `data`. Can you give us a [mcve] for that version? – abarnert Sep 10 '18 at 21:23
  • @abarnert I changed both examples to be minimal, complete and verifiable, and updated the errors as well. The reason it was looking in `data.SomeClass.reader` is because that is the file's hirarchy as I have multiple data sources and a reader for each. I removed that, and instead just wrote a new class that has the same error. – Amit Sep 11 '18 at 08:08

2 Answers2

111

Error 1:

AttributeError: Can't pickle local object 'SomeClass.some_method..single'

You solved this error yourself by moving the nested target-function single() out to the top-level.

Background:

Pool needs to pickle (serialize) everything it sends to its worker-processes (IPC). Pickling actually only saves the name of a function and unpickling requires re-importing the function by name. For that to work, the function needs to be defined at the top-level, nested functions won't be importable by the child and already trying to pickle them raises an exception (more).


Error 2:

AttributeError: Can't get attribute 'single' on module 'main' from '.../test.py'

You are starting the pool before you define your function and classes, that way the child processes cannot inherit any code. Move your pool start up to the bottom and protect (why?) it with if __name__ == '__main__':

import multiprocessing

class OtherClass:
  def run(self, sentence, graph):
    return False


def single(params):
    other = OtherClass()
    sentences, graph = params
    return [other.run(sentence, graph) for sentence in sentences]

class SomeClass:
   def __init__(self):
       self.sentences = [["Some string"]]
       self.graphs = ["string"]

   def some_method(self):
      return list(pool.map(single, zip(self.sentences, self.graphs)))

if __name__ == '__main__':  # <- prevent RuntimeError for 'spawn'
    # and 'forkserver' start_methods
    with multiprocessing.Pool(multiprocessing.cpu_count() - 1) as pool:
        print(SomeClass().some_method())

Appendix

...I would like to spread the work over all of my cores.

Potentially helpful background on how multiprocessing.Pool is chunking work:

Python multiprocessing: understanding logic behind chunksize

Darkonaut
  • 20,186
  • 7
  • 54
  • 65
21

I accidentally discovered a very nasty solution. It works, as long as you use a def statement. If you declare the function, that you want to use in Pool.map with the global keyword at the beginning of the function that solves it. But I would not rely on this in serious applications

import multiprocessing
pool = multiprocessing.Pool(multiprocessing.cpu_count() - 1)

class OtherClass:
  def run(sentence, graph):
    return False

class SomeClass:
  def __init__(self):
    self.sentences = [["Some string"]]
    self.graphs = ["string"]

  def some_method(self):
      global single  # This is ugly, but does the trick XD

      other = OtherClass()

      def single(params):
          sentences, graph = params
          return [other.run(sentence, graph) for sentence in sentences]

      return list(pool.map(single, zip(self.sentences, self.graphs)))


SomeClass().some_method()