2

I want to perform a break down to a pandas column similarly to the question:

I want to transpose and then "one-hot-encode" style. For example, taking dataframe df

Col1           Col2
 C      {Apple, Orange, Banana}
 A      {Apple, Grape}
 B      {Banana}

I would like to convert this and get:

df

Col1        C   A   B   
Apple       1   1   0
Orange      1   0   0
Banana      1   0   1
Grape       0   1   0

How can I use pandas/Sklearn to achieve this?

Venkatachalam
  • 16,288
  • 9
  • 49
  • 77
Codevan
  • 538
  • 3
  • 20

2 Answers2

2

Here is a possible answer (assuming Col1 is your index):

from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
one_hot_encoded = pd.DataFrame(mlb.fit_transform(df['Col2']), columns=mlb.classes_, index=df.index)
one_hot_encoded.T
abcdaire
  • 1,528
  • 1
  • 10
  • 21
-1

You could transform the multi hot encoding output itself and then create a dataframe.

from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
pd.DataFrame(mlb.fit_transform(df['Col2']).T, columns=df.Col1, index= mlb.classes_)

output:

Col1    C   A   B
Apple   1   1   0
Banana  1   0   1
Grape   0   1   0
Orange  1   0   0

Note: we cannot still call this as one hot encoding. It is multi hot encoding but in transformed form.

Examples

Venkatachalam
  • 16,288
  • 9
  • 49
  • 77