Python 2.7: DataFrame groupby and find find the percentage distribution of values within group

Question

I have a dataframe and i would like to find the percentage difference of values in a column within a group.

An example of a group is df.groupby(['race', 'tyre', 'stint']).get_group(("Australian Grand Prix", "Super soft", 1))

I would like to find out what is the percentage distribution of "time diff" values for each row of the group.

Her is the dataframe in dictionary format.There will be many other groups, but below df only shows the first group.

{'driverRef': {0: 'vettel',
  1: 'raikkonen',
  2: 'rosberg',
  4: 'hamilton',
  6: 'ricciardo',
  7: 'alonso',
  14: 'haryanto'},
 'race': {0: 'Australian Grand Prix',
  1: 'Australian Grand Prix',
  2: 'Australian Grand Prix',
  4: 'Australian Grand Prix',
  6: 'Australian Grand Prix',
  7: 'Australian Grand Prix',
  14: 'Australian Grand Prix'},
 'stint': {0: 1.0, 1: 1.0, 2: 1.0, 4: 1.0, 6: 1.0, 7: 1.0, 14: 1.0},
 'total diff': {0: 125147.50728499777,
  1: 281292.0366694695,
  2: 166278.41312954266,
  4: 64044.234019635056,
  6: 648383.28046950256,
  7: 400675.77449897071,
  14: 2846411.2560531585},
 'tyre': {0: u'Super soft',
  1: u'Super soft',
  2: u'Super soft',
  4: u'Super soft',
  6: u'Super soft',
  7: u'Super soft',
  14: u'Super soft'}}

What is expected output? Do you need [this](https://stackoverflow.com/q/23377108) ? — jezrael, Feb 14 '18 at 13:39
@jezrael yes, i saw but i have difficulty applying it to my own problem. Let me try again.... — doyz, Feb 14 '18 at 13:43

score 0 · Accepted Answer · answered Feb 14 '18 at 16:12

If I understand correctly what you need, this might help:

sums = df.groupby(['race', 'tyre', 'stint'])['total diff'].sum()
df = df.set_index(['race', 'tyre', 'stint']).assign(pct=sums).reset_index()
df['pct'] = df['total diff'] / df['pct']

#                     race        tyre  stint  driverRef    total diff       pct
# 0  Australian Grand Prix  Super soft    1.0     vettel  1.251475e+05  0.027613
# 1  Australian Grand Prix  Super soft    1.0  raikkonen  2.812920e+05  0.062065
# 2  Australian Grand Prix  Super soft    1.0    rosberg  1.662784e+05  0.036688
# 3  Australian Grand Prix  Super soft    1.0   hamilton  6.404423e+04  0.014131
# 4  Australian Grand Prix  Super soft    1.0  ricciardo  6.483833e+05  0.143060
# 5  Australian Grand Prix  Super soft    1.0     alonso  4.006758e+05  0.088406
# 6  Australian Grand Prix  Super soft    1.0   haryanto  2.846411e+06  0.628037

Python 2.7: DataFrame groupby and find find the percentage distribution of values within group

1 Answers1