7

I am having trouble using namedtuples in objects that I want to put into multiprocessing. I am receiving pickling error. I tried couple of things from other stackoverflow posts, but I could not succeed. Here is the structure of my code:

package_main, test_module

 import myprogram.package_of_classes.data_object_module
 import ....obj_calculate

 class test(object):
       if __name__ == '__main__':
             my_obj=create_obj('myobject',['f1','f2'])
             input = multiprocessing.Queue()
             output = multiprocessing.Queue()
             input.put(my_obj)
             j=Process(target=obj_calculate, args=(input,output))
             j.start()

package_of_classes, data_object_module

 import collections
 import ....load_flat_file

 def get_ntuple_format(obj):
     nt_fields=''
     for fld in obj.fields:
         nt_fields=nt_fields+fld+', '
     nt_fields=nt_fields[0:-2]
     ntuple=collections.namedtuple('ntuple_format',nt_fields)
     return ntuple

 Class Data_obj:
    def __init__(self, name,fields):
        self.name=name
        self.fields=fields
        self.ntuple_form=get_ntuple_format(self)  

    def calculate(self):
        self.file_read('C:/files','division.txt')

    def file_read(self,data_directory,filename):
        output=load_flat_file(data_directory,filename,self.ntuple_form)
        self.data=output

utils_package,utils_module

def create_dataobj(name,fields):
    locals()[name]=Data_Obj(name,fields)
    return locals()[name]  

def obj_calculate(input,output):   
    obj=input.get()
    obj.calculate()
    output.put(obj)

loads_module

def load_flat_file(data_directory,filename,ntuple_form):
     csv.register_dialect('csvrd', delimiter='\t', quoting=csv.QUOTE_NONE)
     ListofTuples=[]
     with open(os.path.join(data_directory,filename), 'rb') as f:
          reader = csv.reader(f,'csvrd')
          for line in reader:
               if line:
                   ListofTuples.append(ntuple_form._make(line))
     return ListofTuples

And the error I am getting is:

PicklingError: PicklingError: Can't pickle  class '__main__ . ntuple_format: it's not the same object as __ main __. ntuple_format

P.S. As I extracted this sample code from a large project, please ignore minor inconsistencies.

Bach
  • 6,145
  • 7
  • 36
  • 61
Enes
  • 189
  • 2
  • 11
  • 1
    Can you provide the entire traceback? Do you execute this on Windows? Do you have the chance to execute it on Linux (the way data is provided to the child process is entirely different on both systems, on Linux pickling might be skipped)? – Dr. Jan-Philip Gehrcke Mar 10 '14 at 16:05
  • please provide a [SSCCE](http://sscce.org) for your problem, so you can *extract* the problematic part from the rest of your project's code that makes it harder to understand what are the types involved, where are they declared, what's the lifespan of the processes etc.. Please help us help you! – zmo Mar 10 '14 at 16:16

2 Answers2

7

You cannot pickle a class (in this case, a named tuple) that you create dynamically (via get_ntuple_format). For a class to be picklable, it has to be defined at the top level of an importable module.

If you only have a few kinds of tuples you need to support, consider defining them all in advance, at the top level of a module, and then picking the right one dynamically. If you need a fully dynamic container format, consider just using a dict instead.

Vasiliy Faronov
  • 11,840
  • 2
  • 38
  • 49
3

I'd argue you can pickle a namedtuple, as well as a class defined in __main__.

>>> import dill as pickle
>>> import collections
>>> 
>>> thing = collections.namedtuple('thing', ['a','b'])
>>> pickle.loads(pickle.dumps(thing))
<class '__main__.thing'>

Here's the same thing, used in a class method.

>>> class Foo(object):
...   def bar(self, a, b):
...     thing = collections.namedtuple('thing', ['a','b'])     
...     thing.a = a 
...     thing.b = b
...     return thing 
... 
>>> f = Foo()
>>> q = f.bar(1,2)
>>> q.a
1
>>> q.b
2
>>> q._fields
('a', 'b')
>>> 
>>> pickle.loads(pickle.dumps(Foo.bar))
<unbound method Foo.bar>
>>> pickle.loads(pickle.dumps(f.bar))
<bound method Foo.bar of <__main__.Foo object at 0x10dbf5450>>

You just have to use dill instead of pickle.

Get dill here: https://github.com/uqfoundation

Mike McKerns
  • 33,715
  • 8
  • 119
  • 139