2

I use a Ubuntu machine for developing code and then use Windows for deployment. So, I have pickled a Spacy Vectorizer object using dill on my Ubuntu machine. Now I am trying to load it back on a Windows machine (un-pickle it) but I am getting this error on every try.

I have tried converting paths into PurePath, PurePosixPath etc and then pickling it but I am still getting the same error.

On Ubuntu Machine:

import dill as pickle
import spacy
...

class SpacyVectorizer(object):

    def __init__(self):
        self.nlp = spacy.load('en_core_web_md')

    def fit(self, X, y=None):
        return self    

    def transform(self, X):
        doc_vector = [self.nlp(doc).vector for doc in X]
        doc_vector = np.array(doc_vector)
        return doc_vector

    def fit_transform(self, X, y=None):
        return self.transform(X)



MyVectorizer = SpacyVectorizer()

# here I have tried PurePath, & other pathlib functions but none works
pickle.dump(MyVectorizer, open(r'some_path/path/Vect.pkl', 'wb'))

On Windows Machine:

import dill as pickle

obj = pickle.load(open(r'somepath/path/Vect.pkl', 'rb'))

ERROR

NotImplementedError

Traceback (most recent call last)
<ipython-input-32-8553881dccfa> in <module>
----> 1 obj = pickle.load(open(r'somepath/path/Vect.pkl', 'rb'))

D:\Python\envs\Python_BASE\lib\site-packages\dill\_dill.py in load(file, ignore)
    303     # apply kwd settings
    304     pik._ignore = bool(ignore)
--> 305     obj = pik.load()
    306     if type(obj).__module__ == getattr(_main_module, '__name__', '__main__'):
    307         if not ignore:

D:\Python\envs\Python_BASE\lib\pathlib.py in __new__(cls, *args, **kwargs)
    970         if not self._flavour.is_supported:
    971             raise NotImplementedError("cannot instantiate %r on your system"
--> 972                                       % (cls.__name__,))
    973         self._init()
    974         return self

NotImplementedError: cannot instantiate 'PosixPath' on your system

I know doing this can be avoided using a new spacy model in Windows as there is no fitting in the training data, but I wish to know what is causing the error and how it can be fixed. Though, doing this way for any other Vectorizer (which involves fitting on training data like TFIDF etc) works in this way but not just this.

I found some ref on here pathlib.py: Instantiating 'PosixPath' on Windows but doesn't helps.

James Z
  • 12,209
  • 10
  • 24
  • 44
Pranzell
  • 2,275
  • 16
  • 21
  • 1
    I'm seeing a similar problem. No answers or a work around yet? – Eric Hansen Oct 09 '19 at 18:58
  • 2
    A class object for SpacyVectorizer cannot be saved with a preloaded lang-model (sm/md/lg) in a different OS! Once the class is initialized with lang-model & pickled in one OS (like Ubuntu), it also pickles the path of the loaded lang-model (specific to OS) and then if you try to un-pickle in a different OS, it shall give you a "PosixPath" error. There's no answer for it once a lang-model is initialized with a OS path. You could perform a work around - by saving an instance of the class in one OS & then load the pickled-file in different OS & initialize there with a lang-model from that OS. – Pranzell Feb 06 '20 at 06:12
  • Pranzell, that's really helpful. Thanks! I'll give that a try. For context, my issue was creating Docker containers on Windows, but deploying those containers to run in Linux. Tricky stuff. – Eric Hansen Feb 07 '20 at 15:27
  • No problem! Glad I could help. @EricHansen – Pranzell Apr 02 '20 at 15:02

1 Answers1

0

They reason why you have received the error is explained in @Pranzells answer.

The solution I used is to create a "make_pickle_able" function to the class

import dill as pickle
import spacy
...

class SpacyVectorizer(object):

    def __init__(self):
        self.nlp = spacy.load('en_core_web_md')

    def fit(self, X, y=None):
        return self    

    def transform(self, X):
        # reinitiate_spacy-function which will be run the first time you use transform-function.
        if self.nlp is None:
            self.reinitiate_spacy()
        doc_vector = [self.nlp(doc).vector for doc in X]
        doc_vector = np.array(doc_vector)
        return doc_vector

    def fit_transform(self, X, y=None):
        return self.transform(X)

    def make_pickle_able(self):
        self.nlp = None

    def reinitiate_spacy(self)
        self.nlp = spacy.load('en_core_web_md')

And Then use it the following way:

MyVectorizer = SpacyVectorizer()
MyVectorizer = MyVectorizer.make_pickle_able()
# here I have tried PurePath, & other pathlib functions but none works
pickle.dump(MyVectorizer, open(r'some_path/path/Vect.pkl', 'wb'))

Note also the reinitiate_spacy-function which will be run the first time you use transform-function.

if self.nlp is None:
    self.reinitiate_spacy()
Joel
  • 189
  • 7