I wrote a (newbie) python function (below) to draw a bar chart broken out by a primary and possibly a secondary dimension. For example, the image below charts the percentage of people in each gender who have attained a specific level of education.
Question: how do I overlay on each bar the median household size for that subgroup e.g. place a point signifying the value '3' on the College/Female bar. None of the examples I have seen accurately overlay the point on the correct bar.
I'm extremely new to this, so thank you very much for your help!
df = pd.DataFrame({'Student' : ['Alice', 'Bob', 'Chris', 'Dave', 'Edna', 'Frank'],
'Education' : ['HS', 'HS', 'HS', 'College', 'College', 'HS' ],
'Household Size': [4, 4, 3, 3, 3, 6 ],
'Gender' : ['F', 'M', 'M', 'M', 'F', 'M' ]});
def MakePercentageFrequencyTable(dataFrame, primaryDimension, secondaryDimension=None, extraAggregatedField=None):
lod = dataFrame.groupby([secondaryDimension]) if secondaryDimension is not None else dataFrame
primaryDimensionPercent = lod[primaryDimension].value_counts(normalize=True) \
.rename('percentage') \
.mul(100) \
.reset_index(drop=False);
if secondaryDimension is not None:
primaryDimensionPercent = primaryDimensionPercent.sort_values(secondaryDimension)
g = sns.catplot(x="percentage", y=secondaryDimension, hue=primaryDimension, kind='bar', data=primaryDimensionPercent)
else:
sns.catplot(x="percentage", y='index', kind='bar', data=primaryDimensionPercent)
MakePercentageFrequencyTable(dataFrame=df,primaryDimension='Education', secondaryDimension='Gender')
# Question: I want to send in extraAggregatedField='Household Size' when I call the function such that
# it creates a secondary 'Household Size' axis at the top of the figure
# and aggregates/integrates the 'Household Size' column such that the following points are plotted
# against the secondary axis and positioned over the given bars:
#
# Female/College => 3
# Female/High School => 4
# Male/College => 3
# Male/High School => 4