Normalizing and Plotting Data of 3 Columns Grouped by 4th Column

Question

My dataframe is as follows, link to quick .csv. Values a and b can be considered also as true or false.

+------+------+------+--------+
| COl1 | COl2 | COl3 | Group  |
+------+------+------+--------+
| a    | b    | a    | Yellow |
| b    | b    | a    | Blue   |
| a    | a    | b    | Red    |
| a    | b    | a    | Red    |
| a    | a    | b    | Yellow |
| b    | b    | a    | Blue   |
| b    | b    | a    | Yellow |
| a    | a    | b    | Blue   |
| a    | b    | a    | Red    |
| b    | a    | b    | Blue   |
| b    | b    | a    | Yellow |
| a    | a    | a    | Blue   |
| b    | a    | b    | Red    |
+------+------+------+--------+

I want to have a bar plot for the first three columns grouped by the fourth column. Data in the first three columns are categorical, and I would like to have their normalized counts. The number of categories (i.e. the values a and b) in all the three columns is same. In case of single column, I would generally normalize as:

df_grouped = df_main.groupby('Group')['COL1'].value_counts(normalize=True)*100

However, when I tried grouping the columns using the code below, I am unable to normalize the counts before plotting it as bar chart it:

df_grouped = df_main.groupby('Group')['COL1', 'COL2', 'COL3'].count().reset_index()
df_grouped.plot.bar()

Grouping in the plot like below would be great if possible: Any help is appreciated.

score 1 · Answer 1 · answered Apr 12 '21 at 21:21

1

Since your data is binary, you can use groupby like this:

(df.iloc[:,:-1].eq('a')          # `True` class
   .groupby(df['Group']).mean()
   .plot.bar()
)

Output:

answered Apr 12 '21 at 21:21

Quang Hoang

146,074
10
56
74

Thanks a lot! Can it somehow be modified to show percentages instead of normal values? – Jishan Apr 12 '21 at 21:24
1

You can put `.mul(100)` after mean, before plot. – Quang Hoang Apr 12 '21 at 21:50
Or you can use [percentformatter](https://stackoverflow.com/questions/31357611/format-y-axis-as-percent) – Quang Hoang Apr 12 '21 at 21:54

score 1 · Accepted Answer · answered Apr 12 '21 at 21:40

1

Can also get dummies, groupby and plot if you needed to drill deeper into each COL

df1=pd.get_dummies(df, columns=['COl1','COl2','COl3'])
(df1.groupby('Group').mean()*100).plot(kind='bar')

answered Apr 12 '21 at 21:40

wwnde

26,119
6
18
32

Thanks a lot, this is very easy to understand. Is it possible, by anyway to group the bars by the categories `a` and `b`? Basically baking the categories slightly for discriminable etc. to visualization...like col1's a and b bunched together... – Jishan Apr 12 '21 at 21:50
Did you mean`(df.groupby(['COl1','COl2','COl3'])['Group'].value_counts(normalize=True)*100).unstack().plot.bar()` – wwnde Apr 12 '21 at 21:54
I meant something like paired bars position wise, e.g. Col1 'a' and 'b' together and then some space after which col2 'a' and 'b' – Jishan Apr 12 '21 at 21:58
Not sure I understand you. Maybe modify question to include image of what you exactly need done and we can see if doable – wwnde Apr 12 '21 at 22:07
Sure thing. Added to the question. – Jishan Apr 12 '21 at 22:27

Normalizing and Plotting Data of 3 Columns Grouped by 4th Column

2 Answers2