1

I just read this post Label encoding across multiple columns in scikit-learn where the author states:

As the dataframe has many (50+) columns, I want to avoid creating a LabelEncoder object for each column; I'd rather just have one big LabelEncoder objects that works across all my columns of data.

Is it sensible to do this and why?

For me it is more natural to have a separate LabelEncoder for each column of your dataframe with categorical data.

What happens (in the case of the one LabelEncoder across all columns) when you encounter unseen data in a specific column?

Outcast
  • 4,967
  • 5
  • 44
  • 99
  • I'd not consider it sensible, but no word about this from any of the core developers of sklearn answering there, so I'm thoroughly confused now. Also, does it make sense to `LabelEncode` *features*, as opposed to labels or classes? – Shihab Shahriar Khan Sep 02 '19 at 16:38
  • @ShihabShahriarKhan, regarding your question, I think that the author means the same as you you (labels/classes). – Outcast Sep 02 '19 at 16:40
  • Yes, that is possibly the case. Out of curiosity, do you know any practical use-case where 50 mulit-class labels are simultaneously predicted from a single sample? I know the NLP case where a document can have many labels, but that's binary class, present or not present – Shihab Shahriar Khan Sep 02 '19 at 16:51
  • @ShihabShahriarKhan, I do not have sth now in my mind I think ;) – Outcast Sep 02 '19 at 16:52
  • @desertnaut, what is your opinion on the question of my post above? – Outcast Sep 04 '19 at 15:05

0 Answers0