Merge order with items in columns

Question

I have a dataset with all the order, customer and orderitem information. I wandt to expand my orderitems in new columns, but without losing the information about the customer

CustomerId    OrderId    Item
1    1    CD
1    1    DVD
2    2    CD

And the result should be somehow:

CustomerId    OrderId    CD    DVD
1    1    1    1
2    2    1    0

I tried

df2 = pd.concat([df, pd.get_dummies(df.Item)], axis='columns')
df2 = df2.groupby('CustomerId')

Like this: `pd.crosstab([df.CustomerId, df.OrderId], df.Item).reset_index()` — cs95, Jan 04 '19 at 11:17
I am a bit curious - you need count duplicated values or need `0,1` in outputs? — jezrael, Jan 04 '19 at 11:39

cs95 · Accepted Answer · 2019-01-04T11:42:46.857

Simpler is crosstab;

pd.crosstab([df.CustomerId, df.OrderId], df.Item).reset_index()

   CustomerId  OrderId  CD  DVD
0           1        1   1    1
1           2        2   1    0

Or, pivot_table if performance is important.

df.pivot_table(index=['CustomerId', 'OrderId'], 
               columns=['Item'], 
               aggfunc='size', 
               fill_value=0)

Item                CD  DVD
CustomerId OrderId         
1          1         1    1
2          2         1    0

If you want to use dummies, str.get_dummies is another option:

# Solution similar to @jezrael but with str.get_dummies
(df.set_index(['CustomerId', 'OrderId'])
   .Item.str.get_dummies()
   .sum(level=[0, 1])
   .reset_index())

   CustomerId  OrderId  CD  DVD
0           1        1   1    1
1           2        2   1    0

If you need the indicator,

(df.set_index(['CustomerId', 'OrderId'])
   .Item.str.get_dummies()
   .max(level=[0, 1])
   .reset_index())

@jezrael Before taking the high road, keep in mind this was a duplicate. I reopened it and answered because you did not delete your answer. — cs95, Jan 04 '19 at 11:44
yes, reason was I think OP need something else, but I was wrong, clear dupe... — jezrael, Jan 04 '19 at 11:46

Merge order with items in columns

1 Answers1