I just read this post Label encoding across multiple columns in scikit-learn where the author states:
As the dataframe has many (50+) columns, I want to avoid creating a
LabelEncoder
object for each column; I'd rather just have one bigLabelEncoder
objects that works across all my columns of data.
Is it sensible to do this and why?
For me it is more natural to have a separate LabelEncoder
for each column of your dataframe with categorical data.
What happens (in the case of the one LabelEncoder
across all columns) when you encounter unseen data in a specific column?