Pandas List of Values to Binary Columns

Question

I have a dataframe of users who purchased various items. I want to breakout that list of values into separate columns and have a binary flag for users who purchased that item.

Input:

       A           B
0  James  [123, 456]
1   Mary       [123]
2   John  [456, 789]

Expected Output:

       A           B  123  456  789
0  James  [123, 456]    1    1    0
1   Mary       [123]    1    0    0
2   John  [456, 789]    0    1    1

What I've tried (step by step)

df['B'].explode() is my first step:

The using get_dummies() pd.get_dummies(df['B'].explode()):

   123  456  789
0    1    0    0
0    0    1    0
1    1    0    0
2    0    1    0
2    0    0    1

Join it together on index df.join(pd.get_dummies(df['B'].explode())):

       A           B  123  456  789
0  James  [123, 456]    1    0    0
0  James  [123, 456]    0    1    0
1   Mary       [123]    1    0    0
2   John  [456, 789]    0    1    0
2   John  [456, 789]    0    0    1

Problem:

Now I just need to groupby and combine. However, with millions and millions of rows and customers buying 100s of products, this method of joining/combining is highly inefficient. Is there a more "pandas-friendly" or built in function that does this?

for performance you can also try using [this solution](https://stackoverflow.com/a/51420716/9840637) using `MultiLabelBinarizer` — anky, Apr 07 '20 at 14:19

score 7 · Accepted Answer · answered Apr 07 '20 at 14:03

7

you can replace pd.get_dummies(df['B'].explode() with pd.get_dummies(df.B.explode()).sum(level=0) and join.

answered Apr 07 '20 at 14:03

Quang Hoang

146,074
10
56
74

Pandas List of Values to Binary Columns

1 Answers1