Ho can I persistently encode the same String to the same column? Label encoding across multiple columns in scikit-learn propose a nice way to handle a data frame with multiple categorical values. However, I am unsure if this correctly persists (in a pickle) and would apply the same labels again for freshly incoming data.
So far I used pandas directly and obtained the labels via .cat.codes
of the category values. But Now I need to integrate label encoding into a pipeline to deal with fresh incoming data.
Would something like
le = LabelEncoder()
for col in df.select_dtypes([], ['object'].columns:
df[col] = le.fit_transform(df[col])
Or the proposed solution of the MultiColumnLabelEncoder
suffice for my task?