1

I've got a dataframe like this:

df = pd.DataFrame({'months': ['FEBRUARY', 'MARCH', 'MAY', 'DECEMBER', 'MAY']})

And I want to get:

[['JANUARY', 1], ['FEBRUARY', 2], ['MARCH', 3]]

I think it should be very easy but, when y try with this dummy example from sklearn:

from sklearn.preprocessing import OneHotEncoder
enc = OneHotEncoder(handle_unknown='ignore')
X = [[1,'Male'], [ 3,'Female']]
enc.fit(X)

I get the next error:

 ValueError: could not convert string to float: 'Male'

Thx in advance.

Matthieu Brucher
  • 21,634
  • 7
  • 38
  • 62
  • 1
    You need to use a [`LabelEncoder`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html) before your can use the `OneHotEncoder`, but also it looks like `LabelEncoder` is what you actually want in this case – Dan Nov 12 '18 at 11:04
  • Possible duplicate of [Issue with OneHotEncoder for categorical features](https://stackoverflow.com/questions/43588679/issue-with-onehotencoder-for-categorical-features) – Matthieu Brucher Nov 12 '18 at 11:10

1 Answers1

1

you can use map

gender = {'male':1,'female':3}
df.gender.map(gender)
nimrodz
  • 1,504
  • 1
  • 13
  • 18