I have a dataframe of users who purchased various items. I want to breakout that list of values into separate columns and have a binary flag for users who purchased that item.
Input:
A B
0 James [123, 456]
1 Mary [123]
2 John [456, 789]
Expected Output:
A B 123 456 789
0 James [123, 456] 1 1 0
1 Mary [123] 1 0 0
2 John [456, 789] 0 1 1
What I've tried (step by step)
df['B'].explode()
is my first step:
0 123
0 456
1 123
2 456
2 789
The using get_dummies()
pd.get_dummies(df['B'].explode())
:
123 456 789
0 1 0 0
0 0 1 0
1 1 0 0
2 0 1 0
2 0 0 1
Join it together on index df.join(pd.get_dummies(df['B'].explode()))
:
A B 123 456 789
0 James [123, 456] 1 0 0
0 James [123, 456] 0 1 0
1 Mary [123] 1 0 0
2 John [456, 789] 0 1 0
2 John [456, 789] 0 0 1
Problem:
Now I just need to groupby and combine. However, with millions and millions of rows and customers buying 100s of products, this method of joining/combining is highly inefficient. Is there a more "pandas-friendly" or built in function that does this?