1

Let's say I have the following dataframe with two columns: label: can be -1, 0 or 1. years_of_expereicen: can be 0,1,2,3,4,5,6,7,8,9

label   SSP_years_of_experience
22640   -1.0    5.0
181487  1.0 3.0
327672  0.0 9.0
254919  0.0 6.0
136942  1.0 10.0

My goal here is to use this dataframe to create a percentage stacked barchart where the x-axis is years of experience and the bars are different colors each consisting of one years of experience value. In other words, we have 10 possible values on the x-axis and then three bars for each value of different colors that map to each of the labels. The y-axis should be in percent.

I would knoew how to do this in R (with ggplot), but I'm new to matplotlib and somewhat new to python.

Bonus points where I can pass in the two columns as variables (eg. x,y). MOre bonus points for how to display the number of observations in each bar as text in the chart.

ben890
  • 1,097
  • 5
  • 25
  • 56

1 Answers1

3

If you data frame is pandas, try:

exp_name = 'year_of_experience'
label_name = 'label'
new_df = (df.groupby(exp_name)[label_name]
            .value_counts(normalize=True)
            .sort_index()
            .unstack()
         )

new_df.plot.bar(stacked=True)

Toy data frame:

np.random.seed(0)
df = pd.DataFrame({'label': np.random.choice([-1,0,1], size=1000, replace=True),
                   'year_of_experience': np.random.randint(0,10, 1000)})

Output:

enter image description here

Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
  • Bonus points: What is years of experience and label are passed in as variables. How would I do it in that case? – ben890 May 09 '19 at 16:58
  • Hi Quang Hoang, also how would you display the number of observations in each bar? – ben890 May 10 '19 at 13:24
  • 1
    @ben890 see here for the annotations: https://stackoverflow.com/a/50161387/1011724. Also, I would recommend you put all of this in a function so something like `def plot_stacked(df, exp_name, label_name, annotations=True):` – Dan May 10 '19 at 13:34