My dataframe
is as follows, link to quick .csv. Values a
and b
can be considered also as true
or false
.
+------+------+------+--------+
| COl1 | COl2 | COl3 | Group |
+------+------+------+--------+
| a | b | a | Yellow |
| b | b | a | Blue |
| a | a | b | Red |
| a | b | a | Red |
| a | a | b | Yellow |
| b | b | a | Blue |
| b | b | a | Yellow |
| a | a | b | Blue |
| a | b | a | Red |
| b | a | b | Blue |
| b | b | a | Yellow |
| a | a | a | Blue |
| b | a | b | Red |
+------+------+------+--------+
I want to have a bar plot for the first three columns grouped by the fourth column. Data in the first three columns are categorical, and I would like to have their normalized counts. The number of categories (i.e. the values a
and b
) in all the three columns is same. In case of single column, I would generally normalize as:
df_grouped = df_main.groupby('Group')['COL1'].value_counts(normalize=True)*100
However, when I tried grouping the columns using the code below, I am unable to normalize the counts before plotting it as bar chart it:
df_grouped = df_main.groupby('Group')['COL1', 'COL2', 'COL3'].count().reset_index()
df_grouped.plot.bar()
Grouping in the plot like below would be great if possible:
Any help is appreciated.