40

Suppose I have a simple python class definition in a file myClass.py

class Test:
    A = []

And I also have two test scripts. The first script creates an object of type Test, populates the array A, and pickles the result to a file. It immediately unpickles it from the file and the array is still populated. The second script just unpickles from the file, and the array is not populated (i.e. A == []). Why is this?

test1.py

import myClass
import pickle

x = myClass.Test()

for i in xrange(5):
    x.A.append(i)

f = open('data', 'w')
pickle.dump(x,f)
f.close()

f = open('data')
y = pickle.load(f)
f.close

print y.A

and test2.py

import myClass
import pickle

f = open('data')
y = pickle.load(f)
f.close

print y.A
Joe
  • 903
  • 3
  • 11
  • 20

2 Answers2

42

It is because you are setting Test.A as a class attribute instead of an instance attribute. Really what is happening is that with the test1.py, the object being read back from the pickle file is the same as test2.py, but its using the class in memory where you had originally assigned x.A.

When your data is being unpickled from the file, it creates a new instance of the class type, and then applies whatever instance data it needs to. But your only data was a class attribute. Its always referring back to the class thats in memory, which you modified in one, but not in another file.

Compare the differences in this example:

class Test:
    A = []  # a class attribute
    def __init__(self):
        self.a = []  # an instance attribute

You will notice that the instance attribute a will be pickled and unpickled properly, while the class attribute A will simply refer to the class in memory.

for i in range(5):
    x.A.append(i)
    x.a.append(i)  

with open('data', 'wb') as f:
    pickle.dump(x,f)

with open('data', 'rb') as f:
    y = pickle.load(f)

>>> y.A
[0, 1, 2, 3, 4]
>>> y.a
[0, 1, 2, 3, 4]
>>> Test.A
[0, 1, 2, 3, 4]
>>> Test.A = []  # resetting the class attribute
>>> y.a 
[0, 1, 2, 3, 4]
>>> y.A  # refers to the class attribute
[]
jdi
  • 90,542
  • 19
  • 167
  • 203
  • Does this mean that if you had pickled the class itself, `pickle.dump(Test)`, and then unpickled the class, you would have gotten the correct list `A` back in both cases? – BallpointBen Nov 11 '16 at 14:20
  • 1
    @BallpointBen, no it wouldn't preserve the class attribute, as per [what-can-be-pickled-and-unpickled](https://docs.python.org/3/library/pickle.html#what-can-be-pickled-and-unpickled): "classes are pickled by named reference, so the same restrictions in the unpickling environment apply. Note that none of the class’s code or data is pickled" – jdi Nov 15 '16 at 02:00
  • What if the class variable and the instance variable had the same name, would the pickle choose the class or instance variable? – bmc Sep 27 '18 at 02:04
  • 1
    If you were to set an instance variable `A` then it would shadow the class variable `A`. At that point you have a value that is divorced from the class variable anyways. So the answer is that you get the instance variable values when you pickle/unpickle – jdi Sep 27 '18 at 02:34
  • Shouldn't it be `open('data','wb')` as per https://stackoverflow.com/a/13906715/2097158 – fcpenha Jun 11 '21 at 21:14
  • @fcpenha yes thanks. This was originally written for python 2. I've updated the open calls as well as the use of range – jdi Jun 12 '21 at 22:31
12

This is an old question, if you see it now you probably want to set __getstate__ and __setstate__ of your class so pickle would know how to dump and load your defined class.

See examples here.

If your class is simple (e.g. only have ints and strings as members and any method) it should be pickalable automatically.

borgr
  • 20,175
  • 6
  • 25
  • 35