Machine Learning: How do I apply one hot encoding on a pandas dataframe with both categorical and numerical features?

Question

Some features are numerical such as "graduation rate from school", while other features are categorical like the name of the school. I used a label encoder on the features that are categorical to transform them into integers.

I now have a dataframe with both floats and integers, representing numerical features and categorical features(transformed with label encoder) respectively.

I am unsure how to proceed with a learner, do I need to use one hot encoding? And if so, how can I do so? I cannot simply pass the dataframe to the sklearn OneHotEncoder since there are floats, according to my current understanding. Do I just apply the label encoder to all features to solve the issue?

Sample data from my dataframe. OPEID and opeid6 were transformed using a label encoder

Thanks a lot!

Possible duplicate of [sklearn pipeline - how to apply different transformations on different coluns](http://stackoverflow.com/questions/39001956/sklearn-pipeline-how-to-apply-different-transformations-on-different-coluns) — maxymoo, Aug 31 '16 at 23:02
Possible duplicate of [How can I one hot encode in Python?](http://stackoverflow.com/questions/37292872/how-can-i-one-hot-encode-in-python) — Sayali Sonawane, Sep 02 '16 at 07:56

score 0 · Accepted Answer · answered Sep 03 '16 at 06:39

Just use the OneHotEncoder categorical_features argument to select with features are categorical:

categorical_features: “all” or array of indices or mask :

Specify what features are treated as categorical.

‘all’ (default): All features are treated as categorical.

array of indices: Array of categorical feature indices.

mask: Array of length n_features and with dtype=bool.

Non-categorical features are always stacked to the right of the matrix.

Machine Learning: How do I apply one hot encoding on a pandas dataframe with both categorical and numerical features?

1 Answers1