1

Some features are numerical such as "graduation rate from school", while other features are categorical like the name of the school. I used a label encoder on the features that are categorical to transform them into integers.

I now have a dataframe with both floats and integers, representing numerical features and categorical features(transformed with label encoder) respectively.

I am unsure how to proceed with a learner, do I need to use one hot encoding? And if so, how can I do so? I cannot simply pass the dataframe to the sklearn OneHotEncoder since there are floats, according to my current understanding. Do I just apply the label encoder to all features to solve the issue?

Sample data from my dataframe. OPEID and opeid6 were transformed using a label encoder

Thanks a lot!

  • 1
    Possible duplicate of [sklearn pipeline - how to apply different transformations on different coluns](http://stackoverflow.com/questions/39001956/sklearn-pipeline-how-to-apply-different-transformations-on-different-coluns) – maxymoo Aug 31 '16 at 23:02
  • Possible duplicate of [How can I one hot encode in Python?](http://stackoverflow.com/questions/37292872/how-can-i-one-hot-encode-in-python) – Sayali Sonawane Sep 02 '16 at 07:56

1 Answers1

0

Just use the OneHotEncoder categorical_features argument to select with features are categorical:

categorical_features: “all” or array of indices or mask :

Specify what features are treated as categorical.

  • ‘all’ (default): All features are treated as categorical.
  • array of indices: Array of categorical feature indices.
  • mask: Array of length n_features and with dtype=bool.

    Non-categorical features are always stacked to the right of the matrix.

dukebody
  • 7,025
  • 3
  • 36
  • 61