You are closing the parenthesis before actually reshaping the array.
The other problem is that the one hot encoder is creating several columns, you cannot assign that to a single column:
In [1]: from sklearn.preprocessing import OneHotEncoder
In [2]: import pandas as pd
In [3]: df = pd.DataFrame({"sex": list("mmmfff")})
In [4]: ohe = OneHotEncoder()
In [5]: ohe.fit_transform(df["sex"].to_numpy().reshape(-1, 1))
Out[5]:
<9x3 sparse matrix of type '<class 'numpy.float64'>'
with 9 stored elements in Compressed Sparse Row format>
In [6]: _.toarray()
Out[6]:
array([[0., 1.],
[0., 1.],
[0., 1.],
[1., 0.],
[1., 0.],
[1., 0.]])
You see that we have 2 columns. If you are sure that you have only 2 values, you can use the drop
parameter of OneHotEncoder
, it will drop the first value, and you can assign that to the dataframe:
In [11]: ohe_with_drop = OneHotEncoder(drop="first")
In [12]: ohe_with_drop.fit_transform(df["sex"].to_numpy().reshape(-1, 1)).toarray()
Out[12]:
array([[1.],
[1.],
[1.],
[0.],
[0.],
[0.]])
In [13]: df["sex_ohe"] = ohe_with_drop.fit_transform(df["sex"].to_numpy().reshape(-1, 1)).toarray()
In [14]: df
Out[14]:
sex sex_ohe
0 m 1.0
1 m 1.0
2 m 1.0
3 f 0.0
4 f 0.0
5 f 0.0
See the scikit-learn documentation for more about one hot encoding.
As an alternative, you can use pandas.get_dummies
:
In [18]: pd.get_dummies(df["sex"])
Out[18]:
f m
0 0 1
1 0 1
2 0 1
3 1 0
4 1 0
5 1 0
See this answer for more.