2

I have the typical flask structure in my project. Everything was working fine till i tried to load a pickled object inside my flask app. I created the pickled object with a different a python script, and had a dependency to some custom classes. I think it is an issue when you pickle inside a main and expects the classes to be located there but I havent figured out how to sort it out. I tried add the classes to the pipeline_classes.py and importing them but it did not work. Any ideas would be appreciated.

This is the script that produced the pickled object:

train.py

import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.multiclass import OneVsRestClassifier
from sklearn.pipeline import Pipeline
import pickle
from sklearn.externals import joblib
from sklearn.pipeline import FeatureUnion
from sklearn.feature_extraction import DictVectorizer
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.base import BaseEstimator, TransformerMixin

class ItemSelector(BaseEstimator, TransformerMixin):

    def __init__(self, column):
        self.column = column

    def fit(self, X, y=None, **fit_params):
        return self

    def transform(self, X):
        return (X[self.column])


class TextStats(BaseEstimator, TransformerMixin):
    """Extract features from each document for DictVectorizer"""

    def fit(self, x, y=None):
        return self

    def transform(self, posts):
        return [{'REPORT_M': text}
                for text in posts]


def train():
    data = joblib.load('data_df.pkl')

    # train and predict
    classifier = Pipeline([
                ('union', FeatureUnion([

                        ('text', Pipeline([
                            ('selector', ItemSelector(column='TEXT')),
                            ('tfidf_vec', TfidfVectorizer(max_df=0.8
                        ])),

                        ('category', Pipeline([
                            ('selector', ItemSelector(column='CATEGORY')),
                            ('stats', TextStats()),
                            ('vect', DictVectorizer())
                        ]))

                ])),
                ('clf', ExtraTreesClassifier(n_estimators=30, max_depth=300, min_samples_split=6, class_weight='balanced'))])

    classifier.fit(data, data.y)
    joblib.dump(classifier, 'et.pkl')

if __name__ == '__main__':
    train()

Then there is my flask app where I try to load that pickled object.

init.py

from flask import Flask
from .pipeline_classes import ItemSelector
from .pipeline_classes import TextStats

app = Flask(__name__)
app.config.from_object('config')

from app import views

run.py

from app import app
app.run(debug=True)

views.py

from app import app
from flask import render_template
from .load import load

@app.before_first_request
def load_classifier():
    print("data loading")
    global loaded
    loaded = load()
    print("data loaded")

load.py

import pickle
import pandas as pd

def load():
    clf_ = pd.read_pickle('et.pkl')

I get the following error:

builtins.AttributeError 
AttributeError: module '__main__' has no attribute 'ItemSelector'

with Traceback:

Traceback (most recent call last) File
"/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1836, in
__call__ return self.wsgi_app(environ, start_response) File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1820, in
wsgi_app response = self.make_response(self.handle_exception(e)) File
"/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1403, in
handle_exception reraise(exc_type, exc_value, tb) File
"/usr/local/lib/python3.5/dist-packages/flask/_compat.py", line 33, in
reraise raise value File
"/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1817, in
wsgi_app response = self.full_dispatch_request() File
"/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1470, in
full_dispatch_request
self.try_trigger_before_first_request_functions() File
"/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1497, in
try_trigger_before_first_request_functions func() File
"/home/q423446/server/app/views.py", line 17, in load_classifier
loaded = load() File "/home/q423446/server/app/load.py", line 11, in
load clf_ = pd.read_pickle('app/ml/et_30.pkl') File
"/usr/local/lib/python3.5/dist-packages/pandas/io/pickle.py", line 68,
in read_pickle return try_read(path, encoding='latin1') File
"/usr/local/lib/python3.5/dist-packages/pandas/io/pickle.py", line 62,
in try_read return pc.load(fh, encoding=encoding, compat=True) File
"/usr/local/lib/python3.5/dist-packages/pandas/compat/pickle_compat.py",
line 117, in load return up.load() File
"/usr/lib/python3.5/pickle.py", line 1039, in load
dispatch[key[0]](self) File "/usr/lib/python3.5/pickle.py", line 1334,
in load_global klass = self.find_class(module, name) File
"/usr/lib/python3.5/pickle.py", line 1388, in find_class return
getattr(sys.modules[module], name) AttributeError: module '__main__'
has no attribute 'ItemSelector'
Vas
  • 343
  • 4
  • 18

1 Answers1

2

Try changing to this at the bottom of your first file, pipeline_classes.py:

if __name__ == "__main__":
    ItemSelector.__module__ = "pipeline_classes"
    train()

Try reading this http://stefaanlippens.net/python-pickling-and-dealing-with-attributeerror-module-object-has-no-attribute-thing.html

codyc4321
  • 9,014
  • 22
  • 92
  • 165
  • Thank you for your reply but this means I would have to rerun the script, which takes too long. Isnt there another workaround? – Vas Jul 14 '17 at 15:51
  • Also, my file name is not "pipeline_classes". Thats a separate file I created to place these 2 classes in, and try import it from flask app. So I guess it should be ItemSelector.__module__ = "train" – Vas Jul 14 '17 at 16:09
  • I tried it but it gives the error no module named pipeline_classes. I wrote from .pipeline_classes import "class". Also tried import pipeline_classes but still the same error. – Vas Jul 18 '17 at 07:18
  • hmm, seems you're in a pickle :) – codyc4321 Jul 18 '17 at 14:29