1

My dataframe is as follows, link to quick .csv. Values a and b can be considered also as true or false.

+------+------+------+--------+
| COl1 | COl2 | COl3 | Group  |
+------+------+------+--------+
| a    | b    | a    | Yellow |
| b    | b    | a    | Blue   |
| a    | a    | b    | Red    |
| a    | b    | a    | Red    |
| a    | a    | b    | Yellow |
| b    | b    | a    | Blue   |
| b    | b    | a    | Yellow |
| a    | a    | b    | Blue   |
| a    | b    | a    | Red    |
| b    | a    | b    | Blue   |
| b    | b    | a    | Yellow |
| a    | a    | a    | Blue   |
| b    | a    | b    | Red    |
+------+------+------+--------+

I want to have a bar plot for the first three columns grouped by the fourth column. Data in the first three columns are categorical, and I would like to have their normalized counts. The number of categories (i.e. the values a and b) in all the three columns is same. In case of single column, I would generally normalize as:

df_grouped = df_main.groupby('Group')['COL1'].value_counts(normalize=True)*100

However, when I tried grouping the columns using the code below, I am unable to normalize the counts before plotting it as bar chart it:

df_grouped = df_main.groupby('Group')['COL1', 'COL2', 'COL3'].count().reset_index()
df_grouped.plot.bar() 

Grouping in the plot like below would be great if possible: enter image description here Any help is appreciated.

Jishan
  • 1,654
  • 4
  • 28
  • 62

2 Answers2

1

Since your data is binary, you can use groupby like this:

(df.iloc[:,:-1].eq('a')          # `True` class
   .groupby(df['Group']).mean()
   .plot.bar()
)

Output:

enter image description here

Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
1

Can also get dummies, groupby and plot if you needed to drill deeper into each COL

df1=pd.get_dummies(df, columns=['COl1','COl2','COl3'])
(df1.groupby('Group').mean()*100).plot(kind='bar')

enter image description here

wwnde
  • 26,119
  • 6
  • 18
  • 32
  • Thanks a lot, this is very easy to understand. Is it possible, by anyway to group the bars by the categories `a` and `b`? Basically baking the categories slightly for discriminable etc. to visualization...like col1's a and b bunched together... – Jishan Apr 12 '21 at 21:50
  • Did you mean`(df.groupby(['COl1','COl2','COl3'])['Group'].value_counts(normalize=True)*100).unstack().plot.bar()` – wwnde Apr 12 '21 at 21:54
  • I meant something like paired bars position wise, e.g. Col1 'a' and 'b' together and then some space after which col2 'a' and 'b' – Jishan Apr 12 '21 at 21:58
  • Not sure I understand you. Maybe modify question to include image of what you exactly need done and we can see if doable – wwnde Apr 12 '21 at 22:07
  • Sure thing. Added to the question. – Jishan Apr 12 '21 at 22:27