2

I have a pandas groupby that I've done

grouped = df.groupby(['name','type'])['count'].count().reset_index()

Looks like this:

name  type    count
x     a       32
x     b       1111
x     c       4214

What I need to do is take this and generate percentages, so i would get something like this (I realize the percentages are incorrect):

name  type  count
x     a     1%
x     b     49%
x     c     50%

I can think of some pseudocode that might make sense but I haven't been able to get anything that actually works...

something like

def getPercentage(df):
    for name in df: 
        total = 0
        where df['name'] = name:
            total = total + df['count'] 
            type_percent = (df['type'] / total) * 100
            return type_percent

df.apply(getPercentage)

Is there a good way to do this with pandas?

Ryan Black
  • 161
  • 2
  • 2
  • 7
  • Can you provide a short input sample and the output you'd expect given the sample? – smj Dec 20 '18 at 23:56

3 Answers3

1

Try:

df.loc[:,'grouped'] = df.groupby(['name','type'])['count'].count() / df.groupby(['name','type'])['count'].sum()
Jorge
  • 2,181
  • 1
  • 19
  • 30
0

Any series can be normalized by just passing in an argument "normalize=False" as follows (it's cleaner than deviding by count):

Series.value_counts(normalize=True, sort=True, ascending=False) So, it will be something like (which is a series, not a dataframe):

df['type'].value_counts(normalize=True) * 100

or, if you use groupby, you can simply do:

total = grouped['count'].sum()
grouped['count'] = grouped['count']/total * 100
zafrin
  • 434
  • 4
  • 11
0

Using crosstab + normalize

pd.crosstab(df.name,df.type,normalize='index').stack().reset_index()
BENY
  • 317,841
  • 20
  • 164
  • 234