I am trying to build up an inference pipeline. It consists of two parts. Monthly ML model training using some tabular order metadata in previous years and daily inference prediction using new orders taken on that day. There are several string categorical columns I want to include in my model which I used labelencoder to convert them into integers. I am wondering how can I make sure I convert daily inference dataset into the same categories during data preprocessing. Should I save the dictionary of labelencoder and mapping to my inference dataset? Thanks.
Asked
Active
Viewed 4,302 times
1
-
Am I correct to infer that you're using Python? – Lukasz Tracewski May 16 '19 at 06:45
-
Yes. I am using python – larui529 May 16 '19 at 15:59
-
Then I hope my post answers your problem. Please check it and mark as solved if that's the case or let us know what's missing / doesn't work. – Lukasz Tracewski May 16 '19 at 16:16
1 Answers
4
Typically you'd serialise your LabelEncoder e.g. like this. You could also use pickle
or joblib
modules (I'd advise the latter). Code:
import joblib
joblib.dump(label_encoder, 'label_encoder.joblib')
label_encoder = joblib.load('label_encoder.joblib')
Since you're asking about dict, I presume you might refer to packing LabelEncoder into a dictionary, something I often do with dataframes. Take this example:
import pandas
from collections import defaultdict
from sklearn import preprocessing
df = pandas.DataFrame({
'pets': ['cat', 'dog', 'cat', 'monkey', 'dog', 'dog'],
'owner': ['Champ', 'Ron', 'Brick', 'Champ', 'Veronica', 'Ron'],
'location': ['San_Diego', 'New_York', 'New_York', 'San_Diego', 'San_Diego',
'New_York']
})
d = defaultdict(preprocessing.LabelEncoder)
fit = df.apply(lambda x: d[x.name].fit_transform(x))
fit
now holds encoded data. We can now reverse the encoding with:
fit.apply(lambda x: d[x.name].inverse_transform(x))
To serialise dictionary of LabelEncoder
you'd follow the same route as with single one:
joblib.dump(d, 'label_encoder_dict.joblib')

Lukasz Tracewski
- 10,794
- 3
- 34
- 53