plot grouped information from survey

Question

I have a dataframe with a variable of interest (categorical, here Yes, No, etc.) and a grouping variable (see below):

import pandas as pd
import numpy as np
df = pd.DataFrame({ 
    'ID': range(100),
    'group': np.random.choice(['A', 'B', 'C'], 100),
    'Response':np.random.choice(['Yes','No','Other', np.nan], 100)})

From this, I would like to retrieve and plot the accumulated data per group in a bar plot.

In detail: for group A, the percentage of Yes, No, etc., the same for group by. The command df['Response'].groupby(df['group']).value_counts() already gives me this output:

group  Response
A      Other       14
       No           8
       Yes          8
       nan          8
B      Other       11
       nan         11
       No           5
       Yes          4
C      No           9
       Yes          9
       nan          7
       Other        6
Name: Response, dtype: int64

This is what I want but I can't find a way to plot it appropriately (in matplotlib or seaborn) and am unsure if this is an issue of data transformation or visualization.

This question is asking about something similar but I can't get it to work with unstack:

df = df['group'].unstack(0, fill_value = 0)

gives

AttributeError: 'RangeIndex' object has no attribute 'remove_unused_levels'

and

df = df['group'].unstack(0, fill_value = 0)
df.index.name = None
df.columns.name = None
df.plot.bar(stacked=True)

only plots the ID (ungrouped).

score 3 · Accepted Answer · answered Mar 02 '19 at 15:06

3

You should do unstack without select the columns , the groupby output is Series, and notice you are using Series groupby not pd.DataFrame.groupby

df['Response'].groupby(df['group']).value_counts().unstack(fill_value=0)

type(df['Response'].groupby(df['group']).value_counts())
Out[207]: pandas.core.series.Series

answered Mar 02 '19 at 15:06

BENY

317,841
20
164
234

excellent, thank you. Using `df.plot.bar(stacked=True)`, I get the desired plot. How do I scale from absolute numbers to percentages in this plot (the groups are of different sizes)? – Ivo Mar 02 '19 at 15:29
@Ivo yw :-) happy coding – BENY Mar 02 '19 at 15:35
@Ivo pass normalized in value_counts as True – BENY Mar 02 '19 at 17:43

jezrael · Answer 2 · 2019-03-02T15:34:34.733

I think better is use grouping by column in string and get column after groupby for processing:

df1 = df.groupby('group')['Response'].value_counts().unstack(fill_value=0)

Alternative is use crosstab:

df1 = pd.crosstab(df['group'], df['Response'])

For remove index and columns names is possible use your solution or DataFrame.rename_axis:

#pandas 0.24+
df1.rename_axis(index=None, columns=None).plot.bar(stacked=True)

#pandas bellow
#df1.rename_axis(None).rename_axis(None, axis=1).plot.bar(stacked=True)

Your solution is mainly syntactic sugar for the alternative and much more verbose, obviously is using if need process column before, e.g need lowercase of column Response:

df1 = df['Response'].str.lower().groupby(df['group']).value_counts().unstack(fill_value=0)

First solution is possible, but need 2 rows:

df['Response'] = df['Response'].str.lower()
df1 = df.groupby('group')['Response'].value_counts().unstack(fill_value=0)

EDIT:

For scaling use normalize=True in Series.value_counts:

.value_counts(normalize=True)

excellent, thank you. Using `df1.plot.bar(stacked=True)`, I get the desired plot. How do I scale from absolute numbers to percentages in this plot (the groups are of different sizes)? — Ivo, Mar 02 '19 at 15:29
@Ivo - For scaling change `value_counts()` to `value_counts(normalize=True)` — jezrael, Mar 02 '19 at 15:31

plot grouped information from survey

2 Answers2