1

I have a data set like this.

Customer_ID Country Gender Premium_account
C_01        A      M       Yes
C_02        A      F       No
C_03        A      F       Yes
C_04        A      M       Yes
C_05        A      M       No
C_06        B      M       No
C_07        B      M       No
C_08        B      M       Yes
C_09        B      F       Yes
C_10        B      F       No
C_11        B      F       No

Now I would like to plot for the Premium_account percentage for each Gender and separate by Country. I wonder how could I generate a proportional bar plot like in the attached image using Python packages. I am actually not sure about which packages I should use for this, hence I really cannot provide any coding of mine so far. I would appreciate any help that you can provide. Thanks in advance! Proportional bar plot

Chris
  • 1,618
  • 13
  • 21
Hà Hoàng
  • 11
  • 3
  • Ideally you should provide some code to show what you've tried. Can you at least specify a bit more about what libraries you're using (`pandas` will handle this very well) and how you want the plot (e.g. which of `Customer_ID`, `Country` and `Gender` do you want to plot `Premium_Account` against? – Chris Sep 12 '20 at 10:33
  • Thank you Chris for your suggestion. I have updated the question accordingly. However, I really dont know which approach I should take for this problem, hence, I cannot provide any coding of mine. I really appreciate your help. – Hà Hoàng Sep 12 '20 at 10:46
  • ```matplotlib``` is great for plotting graphs. Found a stacked graph usage here: https://python-graph-gallery.com/13-percent-stacked-barplot/ – drd Sep 12 '20 at 11:08
  • thank you @drd. It's a very nice suggestion! – Hà Hoàng Sep 12 '20 at 11:46

1 Answers1

0

Here's one way to get the desired result.

Imports and create dataframe

import pandas as pd

data = """Customer_ID Country Gender Premium_account
C_01 A M Yes
C_02 A F No
C_03 A F Yes
C_04 A M Yes
C_05 A M No
C_06 B M No
C_07 B M No
C_08 B M Yes
C_09 B F Yes
C_10 B F No
C_11 B F No"""

df = pd.DataFrame([x.split(' ') for x in data.split('\n')[1:]])
df.columns = data.split('\n')[0].split(' ')

Solution

First we map the Premium_account and Gender columns to binary, so that we can do maths with them.

df.Premium_account = df.Premium_account.map({'Yes': 1, 'No': 0})
df.Gender = df.Gender.map({'F': 1, 'M': 0})
df

    Customer_ID Country  Gender  Premium_account
0         C_01       A       0                1
1         C_02       A       1                0
2         C_03       A       1                1
3         C_04       A       0                1
4         C_05       A       0                0
5         C_06       B       0                0
6         C_07       B       0                0
7         C_08       B       0                1
8         C_09       B       1                1
9         C_10       B       1                0
10        C_11       B       1                0

Next, we compute the proportions of people who have a premium account for each country by gender, using pandas' groupby method.

account_proportions = ((df
                       .groupby(['Country', 'Gender'])
                       .Premium_account.sum() / df.Premium_account.sum())
                       .unstack('Gender')
)
account_proportions

Gender     0    1
Country          
A        0.4  0.2
B        0.2  0.2

Next, we normalise the dataframe row-wise. This is what gives us a proportional plot instead of just a stacked bar plot.

account_proportions = account_proportions.div(account_proportions.sum(axis=1), axis=0)
account_proportions

Gender     0    1
Country          
A        0.4  0.2
B        0.2  0.2

Lastly, we can use pandas' own built in plotting method to plot a stacked bar chart. Note that we save the axis object that is returned by the plot.bar() method, so that we can fix the legend entries back to 'Male' and 'Female'. We're also specifying the title as part of the call to plot.bar(). We also use the rot argument to make sure the x-axis labels are as desired.

ax = account_proportions.plot.bar(stacked='True',
                                  title='Premium Account Customer Plot',
                                  rot=0)
ax.legend(['Male', 'Female'])

Premium_Account_Proportional_Plot

Of course, there are other things that could be done, like changing colours or adding a y-axis label (please add a y-axis label) but this will get you most of the way.

Chris
  • 1,618
  • 13
  • 21