I have a mass DataFrame df
(sorted by 'year'
):
year gender
1894 male
1895 male
1895 male
1896 male
1900 male
...
2008 male
2008 female
2009 male
2009 female
2009 female
and I aim to make a stacked bar chart with the x-axis 'year'
and the y-axis the number of occurrences of those year values, with ['gender'] == 'female'
on top of ['gender'] == 'male'
on each bar.
I tried the following:
import plotly.express as px
df['freq'] = df.groupby('year')['gender'].transform('count')
fig = px.bar(df, x="year", y="freq", color='gender')
fig.show()
However, this takes up too much runtime and returns a blank graph. So, instead of creating a stacked bar chart using plotly
, I attempted utilizing matplotlib
:
import matplotlib.pyplot as plt
df_male = df[df['gender'] == 'male']
df_female = df[df['gender'] == 'female']
X = range(1894, 2010)
plt.bar(X, df_male['year'], color = 'b')
plt.bar(X, df_female['year'], color = 'r', bottom = df_male['year'])
plt.show()
But this returns ValueError: shape mismatch: objects cannot be broadcast to a single shape
, which I wonder if this is due to the fact that there are some years between 1894 and 2009 in df
that do not exist (e.g. 1897, 1898, 1899, etc.).
Any insights to help me go further would be appreciated.