0

I'm new to programming overall, and I'm struggling with some pandas df aggregation.

I'm trying to group a df by two columns "A" and "B" and then the series to display the frequency of B, over all the data, not only the group. I'm trying the below. group = df.groupby(['A', 'B']).size() ###this will show only the group frequency of B.

Let's say A is a transaction Id and B is a product. I want to know how many times each product appears when looking over all transactions, but in this structure of grouping, and keeping it into a grouped series not changing back to a df.

Thank you

Andru
  • 23
  • 3
  • 1
    Hi. Welcome to StackOverflow. Please update your post to include a [minimum reproducible example](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) and your expected output. – not_speshal Apr 26 '22 at 15:02

1 Answers1

1

You can use the pd.pivot_table to do the summary:

# Import packages
import pandas as pd, numpy as np

# Initialize a sample dataframe
df = pd.DataFrame({
    "Transacion_ID": [1, 2, 3, 4, 5, 6, 7, 8, 9],
    "Product": ["milk", "milk", "milk", "milk", "milk",
                          "bread", "bread", "bread", "bread"],
    "Region": ["Eastern", "Eastern", "Eastern", "Eastern", "Eastern",
                          "Western", "Western", "Western", "Western"]

    })

# Display the dataframe
display(df)

# Use pd.pivot_table fuhction to create the summary
table = pd.pivot_table(
                        df, 
                        values='Transacion_ID', 
                        index=['Product'],
                        aggfunc='count')


# Finally show the results               
display(table) 

You can also simply use the groupby function followed by the agg function as follows:

# Groupby and aggregate
table = df.groupby(['Product']).agg({
     'Transacion_ID': 'count'
})

# Finally show the results               
display(table) 
Jane Kathambi
  • 695
  • 6
  • 8