1

I am using PySpark which uses Python's pickle to serialize objects. My use case has a nested defaultdict data structure like:

from collections import defaultdict

nested_dict = defaultdict(lambda: defaultdict(lambda: defaultdict(int)))

Pickling this nested defaultdict structure gives

PicklingError: Can't pickle at 0x1076cc9d8>: attribute lookup on __main__ failed

There's a wonderful workaround in an SO answer for that.

I have been trying that and wondering at some unintuitive functionality/usage that it leads to. For example,

import pickle

def dd():
    def di():
        return defaultdict(int)
    return defaultdict(di)

nested = defaultdict(dd)
pickle.loads(pickle.dumps(nested))

works but following doesn't work

def nested_dd():
    def dd():
        def di():
            return defaultdict(int)
        return defaultdict(di)
    return defaultdict(dd)

pickle.loads(pickle.dumps(nested_dd()))

It gives

AttributeError: Can't pickle local object nested_dd.<locals>.dd

What's happening here?

kamalbanga
  • 1,881
  • 5
  • 27
  • 46
  • 1
    Try with `pickle.loads(pickle.dumps(nested_dd))` instead `pickle.loads(pickle.dumps(nested_dd()))` – shaik moeed Aug 13 '19 at 10:27
  • @shaikmoeed: no change. Any way what I want is a 3-level deep defaultdict, so I'll be instantiating a `nested_dd` anyway. When I try `nested_dd3 = nested_dd()` I get the same error. – kamalbanga Aug 13 '19 at 10:51

1 Answers1

4

While other serialization techniques exist, you can pickle only functions that can be found by from … import foo (because that’s what unpickling a function does). Your “working” example will fail if the outer defaultdict isn’t empty, since each nested dictionary would have a local function as its default factory.

In this case, since none of these functions close over anything, you can just write them at top level.

Davis Herring
  • 36,443
  • 4
  • 48
  • 76