1

I have this dictionary defined by:

def train(features):
    model = collections.defaultdict(lambda: 1)
    for f in features:
        model[f] += 1
    return model

Later along the way, I want to to use pickle and dump the dictionary into a text file:

f = open('dict.txt', 'wb')
pickle.dump(Nwords, f)

However the code doesn't work and I receive an error. Apparently pickle can't work with lambda and I'm better off defining the model using a module-level function. I have already read the answers here

Unfortunately as I am not experienced with Python I am not exactly sure how to do this. I tried:

def dd():
    return defaultdict(int)

def train(features):
##    model = defaultdict(lambda: 1)
    model = defaultdict(dd)
    for f in features:
        model[f] += 1
    return model 

I receive the error:

TypeError: unsupported operand type(s) for +=: 'collections.defaultdict' and 'int'

Other than that, return defaultdict(int) would always assign a zero to the first occurrence of a key, whereas I want it to assign 1. Any ideas on how I can fix this?

Community
  • 1
  • 1
Omid
  • 2,617
  • 4
  • 28
  • 43

1 Answers1

2

Unfortunately, that answer there is correct for that question, but subtly wrong for yours. Although a top-level function instead of a lambda is great and indeed would make pickle a lot happier, the function should return the default value to be used, which for your case is not another defaultdict object.

Simply return the same value your lambda returns:

def dd():
    return 1

Every time you try to access a key in the defaultdict instance that doesn't yet exist, dd is called. The other post then returns another defaultdict instance, that one set to use int as a default, which matches the lambda shown in the other question.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343