I am using PySpark which uses Python's pickle to serialize objects. My use case has a nested defaultdict
data structure like:
from collections import defaultdict
nested_dict = defaultdict(lambda: defaultdict(lambda: defaultdict(int)))
Pickling this nested
defaultdict
structure gives
PicklingError: Can't pickle at 0x1076cc9d8>: attribute lookup on __main__ failed
There's a wonderful workaround in an SO answer for that.
I have been trying that and wondering at some unintuitive functionality/usage that it leads to. For example,
import pickle
def dd():
def di():
return defaultdict(int)
return defaultdict(di)
nested = defaultdict(dd)
pickle.loads(pickle.dumps(nested))
works but following doesn't work
def nested_dd():
def dd():
def di():
return defaultdict(int)
return defaultdict(di)
return defaultdict(dd)
pickle.loads(pickle.dumps(nested_dd()))
It gives
AttributeError: Can't pickle local object
nested_dd.<locals>.dd
What's happening here?