I have a classifier object that is larger than 2 GiB and I want to pickle it, but I got this:
cPickle.dump(clf, fo, protocol = cPickle.HIGHEST_PROTOCOL)
OverflowError: cannot serialize a string larger than 2 GiB
I found this question that has the same problem and it was suggested there to either
- use Python 3 protocol 4 - Not acceptable as I need to use Python 2
- use
from pyocser import ocdumps, ocloads
- Not acceptable as I can't use other (non-trivial) modules - break the object into bytes and pickle each fragment
Is there a way to do so with my classifier? i.e. turn it into bytes, split, pickle, unpickle, concatenate the bytes, and use the classifier?
My code:
from sklearn.svm import SVC
import cPickle
def train_clf(X,y,clf_name):
start_time = time.time()
# after many tests, this was found to be best classifier
clf = SVC(C = 0.01, kernel='poly')
clf.fit(X,y)
print 'fit done... {} seconds'.format(time.time() - start_time)
with open(clf_name, "wb") as fo:
cPickle.dump(clf, fo, protocol = cPickle.HIGHEST_PROTOCOL)
# cPickle.HIGHEST_PROTOCOL == 2
# the error occurs inside the dump method
return time.time() - start_time
after this, I want to unpickle and use:
with open(clf_name, 'rb') as fo:
clf, load_time = cPickle.load(fo), time.time()