Value counts by multi-column groupby

Question

I have a Pandas dataframe that I'm grouping by two columns: ID and Item. I am then keeping the last 5 occurrences in each group (maximum), and would finally like to calculate a ratio of the number of occurrences of one specific type.Specifically, I would like to calculate the count of 'U' for each ID-item divided by the total number of occurrences.

I am having trouble with the last step - I am not sure how to get from the general value counts below to a ratio for each ID-Item grouping. Any tips would be much appreciated - thank you!

d={'ID':[1,1,1,1,1,2,2,2,2,2],'Item': ['box','box','ball','ball','box','box','box','ball','box','ball'],'Direction':['D','U','U','D','U','U','D','D','D','D']}
df=pd.DataFrame(data=d)

z=company_item['revision_arrow'].value_counts()

groups=df.groupby(['ID','Item'])
max_count=5
lastfive=groups.head(max_count).reset_index(drop=True)

#count_series=lastfive['Direction'].value_counts()
count_series=groups['Direction'].value_counts()

I believe the first answer in this question is relevant: [link](https://stackoverflow.com/questions/39132742/groupby-value-counts-on-the-dataframe-pandas). Use `.size().unstack(fill_value=0)` on your groupby object. This will get you the occurrences of 'U' for each ID-item, then you can divide by the total number of occurrences. — Stratix, Aug 31 '18 at 18:06

score 5 · Accepted Answer · edited Aug 31 '18 at 18:43

5

When you use value_counts, you have the option to normalize the results. You can use this parameter, and then index the resulting DataFrame to only include the U rows:

out = (df.groupby(['ID', 'Item'])
          .Direction.value_counts(normalize=True)
          .rename('ratio').reset_index())

out.loc[out.Direction.eq('U')]

   ID  Item Direction     ratio
1   1  ball         U  0.500000
2   1   box         U  0.666667
6   2   box         U  0.333333

edited Aug 31 '18 at 18:43

cs95

379,657
97
704
746

answered Aug 31 '18 at 18:03

user3483203

50,081
9
65
94

Thanks. This works great on the sample code, but for some reason I get this error if I try to do it on my actual data: AttributeError: 'DataFrameGroupBy' object has no attribute 'Direction' – Z_D Aug 31 '18 at 18:07
1

@Z_D does your actual DataFrame have a column named `Direction`? – user3483203 Aug 31 '18 at 18:08
Man that was stupid! No it does not - thank you. Fixed now. This is a really nice solution - thanks for introducing this feature to me. – Z_D Aug 31 '18 at 18:10
1

It's something I always run into when I switch between dummy and real data :) Happy to help! – user3483203 Aug 31 '18 at 18:11

score 0 · Answer 2 · answered Aug 31 '18 at 18:05

If you want calculate only count('U') for each ID group divided into all count('U'), you can use apply with your defined function:

d={'ID':[1,1,1,1,1,2,2,2,2,2],'Item': ['box','box','ball','ball','box','box','box','ball','box','ball'],'Direction':['D','U','U','D','U','U','D','D','D','D']}
df=pd.DataFrame(data=d)


def func(df):
    return len(df[df.Direction == 'U'])

df.groupby('ID').apply(func) / len(df[df.Direction == 'U'])

Output:

ID
1    0.75
2    0.25

You can group by ID and Item, if you need.

Value counts by multi-column groupby

2 Answers2