0

I have a dataframe with 10,000 rows and 200 columns. For simplicity and to be brief I took the following sample of the dataframe :

df = {'Major':['Bachelor in Economics', 'Bachelor in Engineering', 'Bachelor in Finance', 'Bachelor in Biology', 
               'Bachelor in Economics', 'Bachelor in Engineering', 'Bachelor in Finance', 'Bachelor in Finance',
               'Bachelor in Economics', 'Bachelor in Engineering','Bachelor in Finance', 'Bachelor in Biology', 
               'Bachelor in Biology', 'Bachelor in Information Systems', 'Bachelor in Marketing'],
     'Gender':['Male', 'Female', 'Female', 'Female', 'Male', 'Female', 'Male', 'Female', 'Female', 'Male', 'Female', 'Male', 'Male', 'Female','Male']
     }

df = pd.DataFrame.from_dict(df)

My question here is how can I plot a stacked bar chart based on the gender label in python. The expected result is to have a stacked bar chart based on Gender for each Major. I would really appreciate it if someone can help me how do it.

StupidWolf
  • 45,075
  • 17
  • 40
  • 72

1 Answers1

1

You group and count to get the tally for each major:

df.groupby(['Major','Gender']).size().unstack()

Gender                Female Male
Major       
Bachelor in Biology     1.0 2.0
Bachelor in Economics   1.0 2.0
Bachelor in Engineering 2.0 1.0
Bachelor in Finance     3.0 1.0
Bachelor in Information Systems 1.0 NaN
Bachelor in Marketing   NaN 1.0

And then put to this plot:

df.groupby(['Major','Gender']).size().unstack().plot.bar(stacked=True)

enter image description here

If you need it in percentages / proportions, then might be easier to use pd.crosstab:

pd.crosstab(df['Major'],df['Gender'],normalize='index').plot.bar(stacked=True)

enter image description here

StupidWolf
  • 45,075
  • 17
  • 40
  • 72
  • Thank you so much for your answer. I would like to know why are the values on the y-axis between 0 and 4 ? How can I get the percentage of each gender in each major ? – Aboudi Shukor Nov 12 '20 at 14:23
  • oh you want it in percentages? and not counts.. gimme a moment to edit – StupidWolf Nov 12 '20 at 14:23
  • Thank you for your help. I really appreciate it. One final question, is it possible to add the exact value above each bar (annotations) ? – Aboudi Shukor Nov 12 '20 at 14:39
  • it might be a bit more complicated. you can check out https://stackoverflow.com/questions/21397549/stack-bar-plot-in-matplotlib-and-add-label-to-each-section – StupidWolf Nov 13 '20 at 16:10