I have a dataframe with a variable of interest (categorical, here Yes
, No
, etc.) and a grouping variable (see below):
import pandas as pd
import numpy as np
df = pd.DataFrame({
'ID': range(100),
'group': np.random.choice(['A', 'B', 'C'], 100),
'Response':np.random.choice(['Yes','No','Other', np.nan], 100)})
From this, I would like to retrieve and plot the accumulated data per group in a bar plot.
In detail: for group A
, the percentage of Yes
, No
, etc., the same for group by.
The command df['Response'].groupby(df['group']).value_counts()
already gives me this output:
group Response
A Other 14
No 8
Yes 8
nan 8
B Other 11
nan 11
No 5
Yes 4
C No 9
Yes 9
nan 7
Other 6
Name: Response, dtype: int64
This is what I want but I can't find a way to plot it appropriately (in matplotlib or seaborn) and am unsure if this is an issue of data transformation or visualization.
This question is asking about something similar but I can't get it to work with unstack
:
df = df['group'].unstack(0, fill_value = 0)
gives
AttributeError: 'RangeIndex' object has no attribute 'remove_unused_levels'
and
df = df['group'].unstack(0, fill_value = 0)
df.index.name = None
df.columns.name = None
df.plot.bar(stacked=True)
only plots the ID
(ungrouped).