0

I need to save my Training Data set in Data Pickle. Here is the code. When execute this code there was an error. How do I fix this error. I need to save featureCounts and labelCounts variables in two pickles.

from __future__ import division
import collections
import math
import pickle

class TrainClassifier:
    def __init__(self, arffFile):
        self.trainingFile = arffFile
        self.features = {}
        self.featureNameList = []
        self.featureCounts = collections.defaultdict(lambda: 1)
        self.featureVectors = []
        self.labelCounts = collections.defaultdict(lambda: 0)

    def DataTraning(self):
        for fv in self.featureVectors:
            self.labelCounts[fv[len(fv)-1]] += 1 #udpate count of the label
            for counter in range(0, len(fv)-1):
                self.featureCounts[(fv[len(fv)-1], self.featureNameList[counter], fv[counter])] += 1

        for label in self.labelCounts:
            for feature in self.featureNameList[:len(self.featureNameList)-1]:
                self.labelCounts[label] += len(self.features[feature])

    def GetValues(self):
        file = open(self.trainingFile, 'r')

        for line in file:
            if line[0] != '@':  #start of actual data
                self.featureVectors.append(line.strip().lower().split(','))
            else:   #feature definitions
                if line.strip().lower().find('@data') == -1 and (not line.lower().startswith('@relation')):
                    self.featureNameList.append(line.strip().split()[1])
                    self.features[self.featureNameList[len(self.featureNameList) - 1]] = line[line.find('{')+1: line.find('}')].strip().split(',')

        file.close()

    def SaveOnPickle(self):
        f = open('dict.pickle', 'wb')
        pickle.dump(self.labelCounts, f)
        f.close()

if __name__ == "__main__":
    Predic = TrainClassifier("Military.arff")
    Predic.GetValues()
    Predic.DataTraning()
    Predic.SaveOnPickle()

Here is the Error

Traceback (most recent call last):
  File "C:\wamp64\www\M360\M360py\src\TrainClassifier.py", line 69, in <module>
    Predic.SaveOnPickle()
  File "C:\wamp64\www\M360\M360py\src\TrainClassifier.py", line 43, in SaveOnPickle
    pickle.dump(self.labelCounts, f)
  File "C:\Users\Udara\AppData\Roaming\NetBeans\8.1\jython-2.7.0\Lib\pickle.py", line 1370, in dump
    Pickler(file, protocol).dump(obj)
  File "C:\Users\Udara\AppData\Roaming\NetBeans\8.1\jython-2.7.0\Lib\pickle.py", line 224, in dump
    self.save(obj)
  File "C:\Users\Udara\AppData\Roaming\NetBeans\8.1\jython-2.7.0\Lib\pickle.py", line 331, in save
    self.save_reduce(obj=obj, *rv)
  File "C:\Users\Udara\AppData\Roaming\NetBeans\8.1\jython-2.7.0\Lib\pickle.py", line 401, in save_reduce
    save(args)
  File "C:\Users\Udara\AppData\Roaming\NetBeans\8.1\jython-2.7.0\Lib\pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "C:\Users\Udara\AppData\Roaming\NetBeans\8.1\jython-2.7.0\Lib\pickle.py", line 562, in save_tuple
    save(element)
  File "C:\Users\Udara\AppData\Roaming\NetBeans\8.1\jython-2.7.0\Lib\pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "C:\Users\Udara\AppData\Roaming\NetBeans\8.1\jython-2.7.0\Lib\pickle.py", line 746, in save_global
    raise PicklingError(
pickle.PicklingError: Can't pickle <function <lambda> at 0x5>: it's not found as __main__.<lambda>
Lakmal Geekiyanage
  • 25
  • 1
  • 1
  • 10
  • I wouldn't say it's an exact duplicate. In that particular case, there was no need to serialize a function, a simpler solution could be used. Interesting link question/answers though! – Jean-François Fabre Sep 01 '16 at 08:43

1 Answers1

1

you cannot serialize self.labelCounts because it is a defaultdict (no problem with that) with a lambda in it: here's the catch: Pickle cannot serialize functions.

you wrote:

self.labelCounts = collections.defaultdict(lambda: 0)

But you are lucky: you don't need a lambda here (you need a lambda for mutable objects such as lists but with 0 no problem), just do:

self.labelCounts = collections.defaultdict(0)

(of course it's the same problem and solution for your other dict featureCounts). Do that:

self.featureCounts = collections.defaultdict(1)
Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
  • I need to save featureCounts also. Shall I can remove lambda from it also? What is the reason to use lambda for those variable? – Lakmal Geekiyanage Sep 01 '16 at 08:17
  • When I try your solution, There is error like this. `Traceback (most recent call last): File "C:\wamp64\www\M360\M360py\src\TrainClassifier.py", line 66, in Predic = TrainClassifier("C:/Users/Udara/Desktop/CDAP/Museum 360 (System)/Nadeeraka/Dataset/Military.arff") File "C:\wamp64\www\M360\M360py\src\TrainClassifier.py", line 14, in __init__ self.featureCounts = collections.defaultdict(list()) TypeError: first argument must be callable` – Lakmal Geekiyanage Sep 01 '16 at 08:30
  • `self.featureCounts = collections.defaultdict(1) self.labelCounts = collections.defaultdict(0)` These are the code which I replaced into my code. But above error was generated for this code also – Lakmal Geekiyanage Sep 01 '16 at 08:42
  • you got it. Tell me if it works. – Jean-François Fabre Sep 01 '16 at 08:44