How to drop irrelevant indexes in multi index pivot pandas

Question

I have 2 indexes say: year and month. And I’m using pivot to display the count of products.

Now, in my df, let’s say there are no records of month july and august in 2020. But the pivot will show these months and values 0. I don’t want the pivot to show these irrelevant rows (which are not present in the df) as they make my final pivot very long. How to reduce this?

Here is my example df:

df = pd.DataFrame({'Product Type': ['Fruits', 'Fruits', 'Vegetable', 'Vegetable', 'Vegetable', 'Vegetable', 'Fruits', 'Fruits', 'Vegetables', 'Cars', 'Cars', 'Cars', 'Bikes', 'Bikes'],
                   'Product': ['Apple', 'Banana', 'Apple', 'Apple', 'Brocoli', 'Carrot', 'Apple', 'Banana', 'Brocoli', 'BMW M3', 'BMW M3', 'BMW M3', 'Hayabusa', 'Hayabusa'],
                   'Amount': [4938, 3285, 4947, 1516, 2212, 3778, 1110, 4436, 1049, 494, 2818, 3737, 954, 4074],
                  })

And my code so far:

import pandas as pd
import numpy as np
df = pd.read_csv('try.csv')
bins = [0,1000,2000,5000,float(np.inf)]
labels = ['0-1000','1000-2000','2000-5000','5000+']
df['bins'] = pd.cut(df['Amount'],bins=bins, labels=labels, right=True)
pivot = df.pivot_table(index=['Product Type','Product'],columns='bins', aggfunc='size')
pivot.dropna(inplace=True)
pivot

Expected ouput:

Amount                 0-1000  1000-2000  2000-5000  5000+
Product Type Product                                      
Bikes        Hayabusa       1          0          1      0
Cars         BMW M3         1          0          2      0
Fruits       Apple          0          1          1      0
             Banana         0          0          2      0
Vegetable    Apple          0          1          1      0
             Brocoli        0          0          1      0
             Carrot         0          0          1      0
Vegetables   Brocoli        0          1          0      0

In the df, Bikes only contains 'hayabusa', which i want in my pivot's Bike category. How should I do this?

"*I have 2 indexes say: year and month*" -> please be explicit, provide a clear [minimal reproducible example](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples), you exact code, and the matching expected output — mozway, Jun 22 '23 at 09:32
thank but you obviously haven't read my link, an image is **not reproducible** you must provide the input data as text — mozway, Jun 23 '23 at 12:08

score 0 · Answer 1 · answered Jun 22 '23 at 19:46

Without knowing exactly what your data looks like, the below is my attempt at providing an answer:

Here is some example data:

details = {
    'year':[2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,
            2021,2021,2021,2021,2021,2021,2021,2021,2021,2021,2021,2021,],
    'month':[1,2,3,4,5,6,7,8,9,10,11,12,
             1,2,3,4,5,6,7,8,9,10,11,12,],
    'product_count':[102,67,36,23,7,6,np.nan,np.nan,5,3,3,2,
                     33,36,53,49,42,56,63,39,42,40,54,19,],
    'product_category':['Cars','Cars','Cars','Cars','Cars','Cars','Cars','Cars','Cars','Cars','Cars','Cars',
                        'Cars','Cars','Cars','Cars','Cars','Cars','Cars','Cars','Cars','Cars','Cars','Cars',]
                        }

df = pd.DataFrame(details)

I have set data in July 2020 and August 2020 to be missing/NaN.

Considering you state using a pivot with indexes of year and month, I am assuming something like the below:

(df
 .pivot(
    index=['year','month'], 
    columns=['product_category'], 
    values=['product_count'])
 .dropna()
)

Notice how chaining ".dropna()" on the end of the pivot excludes July 2020 and August 2020 from the output.

Hi, In my data, I dont have a count column. Also I've tried to make the synthetic data here as close to my actual df. For confidentialilty reasons, im unable to display the real data — Sharan Shetty, Jun 23 '23 at 12:06

score 0 · Accepted Answer · answered Jun 23 '23 at 12:18

Use cut and crosstab:

bins = [0, 1000, 2000, 5000, np.inf]
labels = ['0-1000', '1000-2000', '2000-5000', '5000+']

out = pd.crosstab([df['Product Type'], df['Product']],
                  pd.cut(df['Amount'], bins=bins, labels=labels)
                 ).reindex(columns=labels, fill_value=0)

Output:

Amount                 0-1000  1000-2000  2000-5000  5000+
Product Type Product                                      
Bikes        Hayabusa       1          0          1      0
Cars         BMW M3         1          0          2      0
Fruits       Apple          0          1          1      0
             Banana         0          0          2      0
Vegetable    Apple          0          1          1      0
             Brocoli        0          0          1      0
             Carrot         0          0          1      0
Vegetables   Brocoli        0          1          0      0

Used input:

df = pd.DataFrame({'Product Type': ['Fruits', 'Fruits', 'Vegetable', 'Vegetable', 'Vegetable', 'Vegetable', 'Fruits', 'Fruits', 'Vegetables', 'Cars', 'Cars', 'Cars', 'Bikes', 'Bikes'],
                   'Product': ['Apple', 'Banana', 'Apple', 'Apple', 'Brocoli', 'Carrot', 'Apple', 'Banana', 'Brocoli', 'BMW M3', 'BMW M3', 'BMW M3', 'Hayabusa', 'Hayabusa'],
                   'Amount': [4938, 3285, 4947, 1516, 2212, 3778, 1110, 4436, 1049, 494, 2818, 3737, 954, 4074],
                  })

Hey @mozway how to include 2 columns, say I have a column called rich/poor which has values rich,poor and I want it in the column of the crosstable below the bins — Sharan Shetty, Jun 27 '23 at 13:40

How to drop irrelevant indexes in multi index pivot pandas

2 Answers2