How to Label encode multiple non-contiguous dataframe columns

Question

I am having a pandas dataframe with multiple columns (some of them non-contiguous) which would need to be label encoded. From my understanding of the LabelEncoder class, for each column I would need to use a different LabelEncoder object. I am using the code below (list_of_string_cols in the code below is a list of all the columns which needs to be label encoded)

for col in list_of_string_cols:
      labelenc = LabelEncoder()
      train_X[col] = labelenc.fit_transform(train_X[col])
      test_X[col] = labelenc.transform(test_X[col])

Is this the correct way?

score 0 · Accepted Answer · answered Sep 17 '18 at 10:03

Yes that's correct.

Since LabelEncoder was primarily made to deal with labels and not features, so it allowed only a single column at a time.

Up until the current version of scikit-learn (0.19.2), what you are using is the correct way of encoding multiple columns. See this question which also does what you are doing:

Label encoding across multiple columns in scikit-learn

From next version onwards (0.20), OrdinalEncoder can be used to encode all categorical feature columns at once.

How to Label encode multiple non-contiguous dataframe columns

1 Answers1