0

I am having a pandas dataframe with multiple columns (some of them non-contiguous) which would need to be label encoded. From my understanding of the LabelEncoder class, for each column I would need to use a different LabelEncoder object. I am using the code below (list_of_string_cols in the code below is a list of all the columns which needs to be label encoded)

for col in list_of_string_cols:
      labelenc = LabelEncoder()
      train_X[col] = labelenc.fit_transform(train_X[col])
      test_X[col] = labelenc.transform(test_X[col])

Is this the correct way?

Amit Rastogi
  • 926
  • 2
  • 12
  • 22

1 Answers1

0

Yes that's correct.

Since LabelEncoder was primarily made to deal with labels and not features, so it allowed only a single column at a time.

Up until the current version of scikit-learn (0.19.2), what you are using is the correct way of encoding multiple columns. See this question which also does what you are doing:

From next version onwards (0.20), OrdinalEncoder can be used to encode all categorical feature columns at once.

Vivek Kumar
  • 35,217
  • 8
  • 109
  • 132