I am grouping by two columns in a Pandas DataFrame, after which I count the size of each group. This grouped DataFrame will then be filtered and the data plotted in a bar chart.
The issue i am having is that if a group has a zero count, it is not shown in the DataFrame and therefore does not appear on the plot. So the plot has missing categories on the x-axis when i would rather them include a category even if there is no bar to display (i.e represent the category as zero, thereby imo making the plot more representative of the data as a whole).
# Import the required packages.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Set the appearance of plots.
plt.style.use('ggplot')
# Create sample DataFrame.
data = {'ID':[1, 2, 3, 4, 5, 6, 7], 'Name':['Tom', 'Jack', 'Anne', 'Steve', 'Ricky', 'Jane', 'Beth'], 'Age':[28,34,29,42,15,10,26], 'Voted':[0, 1, 0, 1, 1, 0, 0]}
df = pd.DataFrame(data)
# Bin into age groups and create an Age Group column in the DataFrame.
bins = list(range(0, 60, 10))
df['Age Group'] = pd.cut(df['Age'], bins, right=False)
# Group data by Age Group and Voted columns. Then perform count using the ID column. Make Age Group the new index.
groups = df.groupby(['Age Group', 'Voted'])
new_df = groups.agg({'ID': 'count'}).rename(columns={'ID':'Count'})
new_df.reset_index(inplace=True)
new_df.set_index('Age Group', inplace=True)
new_df
The code above will output this:
Voted ID
Age Group
[10, 20) 0 1
[10, 20) 1 1
[20, 30) 0 3
[30, 40) 1 1
[40, 50) 1 1
What I'd like is for something like the result below, from which i can filter out just the voted=1 age groups and plot in a chart:
Voted ID
Age Group
[0, 10) 0 0
[0, 10) 1 0
[10, 20) 0 1
[10, 20) 1 1
[20, 30) 0 3
[20, 30) 1 0
[30, 40) 0 0
[30, 40) 1 1
[40, 50) 0 0
[40, 50) 1 1
I have searched the similar questions/results (most relative below), but i can't seem to get either to work.
[Pandas groupby for zero values [Pandas Groupby How to Show Zero Counts in DataFrame
I've also noticed that if i perform the count on just a single column, that zero groups do show up in the DataFrame. Why is this? eg:
# Group data by just Age Group column. Then perform count using the ID column.
groups = df.groupby(['Age Group'])
new_df = groups.agg({'ID': 'count'}).rename(columns={'ID':'Count'})
new_df # count displays the zero here for the 0-10 age group.
Any help in explaining what is going on here would be appreciated.