1

I have a pandas DataFrame with monthly product use by customers and country, like this:

df = pd.DataFrame(
[
    ('12345', 'CH', 'A', 'Prod 1'),
    ('12345', 'CH', 'A', 'Prod 2'),
    ('67890', 'DE', 'A', 'Prod 1'),
    ('98765', 'CH', 'B', 'Prod 3'),
    ('nnnnn', 'NL', 'C', 'Prod 1')
],
    columns=['Client_ID', 'Country', 'Customer', 'Product Used']
)

I want to list the total amount of product use grouped by customer and country. The pandas groupby feature gets me close to what i need.

df.groupby(['Customer', 'Country','Product Used']).count()

#Reuse Client_ID as Count
Customer    Country Product Used    Client_ID
A           CH      Prod 1          3
                    Prod 2          5
            DE      Prod 1          1
B           CH      Prod 3          2
C           NL      Prod 1          1

Is there a way to include the combinations that don't appear in the data as 0? So my results look like this:

Customer    Country Prod 1  Prod 2  Prod 3
A           CH      3       5       0
            DE      1       0       0
B           CH      0       0       2
C           NL      1       0       0
PaulM
  • 13
  • 2

1 Answers1

1

Use pd.crosstab:

new_df = pd.crosstab([df['Customer'], df['Country']], df['Product Used'])

new_df:

Product Used      Prod 1  Prod 2  Prod 3
Customer Country                        
A        CH            1       1       0
         DE            1       0       0
B        CH            0       0       1
C        NL            1       0       0

Or unstack after groupby count, with fill_value=0 then droplevel 0 from columns:

new_df = (
    df.groupby(['Customer', 'Country', 'Product Used']).count()
        .unstack(fill_value=0)
        .droplevel(0, axis=1)
)

new_df:

Product Used      Prod 1  Prod 2  Prod 3
Customer Country                        
A        CH            1       1       0
         DE            1       0       0
B        CH            0       0       1
C        NL            1       0       0

Or with pivot_table with aggfunc set to count and fill_value=0:

new_df = (
    df.pivot_table(index=['Customer', 'Country'], columns='Product Used',
                   values='Client_ID', aggfunc='count', fill_value=0)
)

new_df:

Product Used      Prod 1  Prod 2  Prod 3
Customer Country                        
A        CH            1       1       0
         DE            1       0       0
B        CH            0       0       1
C        NL            1       0       0
Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
  • found another way `df.groupby(['Customer', 'Country','Product Used']).count().reset_index(inplace=False).pivot_table(index=["Customer", "Country"], columns="Product Used", values="Client_ID", fill_value=0)` but it's too complex – Kiran Kumar Kotari Jul 08 '21 at 16:38
  • 1
    If you're going to use a `pivot_table` it should just be `df.pivot_table(index=['Customer', 'Country'], columns='Product Used', values='Client_ID', aggfunc='count', fill_value=0)` – Henry Ecker Jul 08 '21 at 16:40