I have a data frame df
where the combination of 5 columns: item, fav_topping, cheese, crust, order_yr_month
is unique. There are 1450 rows.
item fav_topping cheese crust order_yr_month #_of_orders
0 Pizza Pineapple PPJ Thin 2020-12 0
1 Pizza Pineapple PPJ Thick 2021-01 0
2 Pizza Pineapple PPJ Thin 2021-02 0
3 Pizza Pineapple PPJ Thick 2021-03 0
4 Pizza Pineapple PPJ Pan 2021-04 9
... ... ... ... ... ... ...
1446 Sandwich Pickles CHD Pan 2022-07 91
1447 Sandwich Pickles CHD Thick 2022-08 91
1448 Sandwich Pickles CHD Thin 2022-09 11
1449 Sandwich Pickles CHD Cheese 2022-10 12
1450 Sandwich Pickles CHD Cheese 2027-12 0
I want to convert this to result_df
which needs to be exported to excel where the unique order_yr_month
needs to be transposed and the #_of_orders
must be aggregated. Notice the number of rows (65) now has the unique combination of only 4 columns item, fav_topping, cheese, crust
item fav_topping cheese crust 2020-12 2021-01 2021-02 2021-03 2021-04 2022-07 2022-08 2022-09 2022-10
0 Pizza Pineapple PPJ Thin 0 0 0 0 0 0 0 0 0
1 Pizza Pineapple PPJ Thick 0 0 0 0 0 0 0 0 0
2 Pizza Pineapple PPJ Cheese 0 0 0 0 0 0 0 0 0
3 Pizza Pineapple PPJ Roast 0 0 0 0 0 0 0 0 0
4 Pizza Pineapple PPJ Pan 0 0 0 0 9 0 0 0 0
... ... ... ...
61 Sandwich Pickles CHD Pan 0 0 0 0 0 91 0 0 0
62 Sandwich Pickles CHD Thick 0 0 0 0 0 0 91 0 0
63 Sandwich Pickles CHD Thin 0 0 0 0 0 0 0 11 0
64 Sandwich Pickles CHD Cheese 0 0 0 0 0 0 0 0 12
65 Sandwich Pickles CHD Roast 0 0 0 0 0 0 0 0 0
What is the pandas way of doing this?
My attempts all failed:
- Using pivot_table on partial table?
- Using transpose
- Using multiindex
- Extracting sub table
index, order_yr_month, #_of_orders
and then transposing them. - The problem here is that the year-month columns are unique inresult_df
however not all rows indf
have all those columns.
UPDATE:
@jezrael pointed me to the right solution here After reading crosstab docs -
This is what worked -
result_df = pd.crosstab(index=[df['item'],df['fav_topping'],df['cheese'],df['crust']], columns=df['order_yr_month'], values=df['orders'], aggfunc='sum')