0

I have a data frame df where the combination of 5 columns: item, fav_topping, cheese, crust, order_yr_month is unique. There are 1450 rows.

        item    fav_topping   cheese   crust    order_yr_month  #_of_orders
0       Pizza   Pineapple       PPJ     Thin        2020-12         0
1       Pizza   Pineapple       PPJ     Thick       2021-01         0
2       Pizza   Pineapple       PPJ     Thin        2021-02         0
3       Pizza   Pineapple       PPJ     Thick       2021-03         0
4       Pizza   Pineapple       PPJ     Pan         2021-04         9
...               ...              ...      ...     ...   ...       ...
1446    Sandwich    Pickles     CHD     Pan         2022-07         91
1447    Sandwich    Pickles     CHD     Thick       2022-08         91
1448    Sandwich    Pickles     CHD     Thin        2022-09         11
1449    Sandwich    Pickles     CHD     Cheese      2022-10         12
1450    Sandwich    Pickles     CHD     Cheese      2027-12         0

I want to convert this to result_df which needs to be exported to excel where the unique order_yr_month needs to be transposed and the #_of_orders must be aggregated. Notice the number of rows (65) now has the unique combination of only 4 columns item, fav_topping, cheese, crust

        item    fav_topping   cheese   crust    2020-12     2021-01     2021-02     2021-03     2021-04     2022-07     2022-08     2022-09     2022-10     
0       Pizza   Pineapple       PPJ     Thin        0           0           0           0           0           0           0           0           0
1       Pizza   Pineapple       PPJ     Thick       0           0           0           0           0           0           0           0           0
2       Pizza   Pineapple       PPJ     Cheese      0           0           0           0           0           0           0           0           0
3       Pizza   Pineapple       PPJ     Roast       0           0           0           0           0           0           0           0           0
4       Pizza   Pineapple       PPJ     Pan         0           0           0           0           9           0           0           0           0
...               ...              ...      ...    
61      Sandwich    Pickles     CHD     Pan         0           0           0           0           0           91          0           0           0
62      Sandwich    Pickles     CHD     Thick       0           0           0           0           0           0           91          0           0
63      Sandwich    Pickles     CHD     Thin        0           0           0           0           0           0           0           11          0
64      Sandwich    Pickles     CHD     Cheese      0           0           0           0           0           0           0           0           12
65      Sandwich    Pickles     CHD     Roast       0           0           0           0           0           0           0           0           0

What is the pandas way of doing this?

My attempts all failed:

  1. Using pivot_table on partial table?
  2. Using transpose
  3. Using multiindex
  4. Extracting sub table index, order_yr_month, #_of_orders and then transposing them. - The problem here is that the year-month columns are unique in result_df however not all rows in df have all those columns.

UPDATE:

@jezrael pointed me to the right solution here After reading crosstab docs -

This is what worked -

result_df = pd.crosstab(index=[df['item'],df['fav_topping'],df['cheese'],df['crust']], columns=df['order_yr_month'],  values=df['orders'], aggfunc='sum')
technazi
  • 888
  • 4
  • 21
  • 42
  • 1
    Can you add your pivot code which failed to question? – jezrael Apr 14 '21 at 05:39
  • @jezrael trying this right now - `pd.crosstab(index=['item','fav_topping','cheese','crust'], columns=['order_yr_month'], values=['#_of_orders'], aggfunc='sum')` – technazi Apr 14 '21 at 05:53
  • hmmm, I think pivot_table solution. Maybe need `df.pivot_table(index=['item','fav_topping','cheese','crust'], columns='order_yr_month', values='#_of_orders', aggfunc='sum', fill_value=0)` – jezrael Apr 14 '21 at 05:55
  • 1
    If failed, can you explain more? – jezrael Apr 14 '21 at 05:56
  • 1
    It worked like a charm man! Thank you so much for pointing to the right question. I'll update the question with the right solution. – technazi Apr 14 '21 at 06:00

0 Answers0