0

I have a dataframe and i would like to find the percentage difference of values in a column within a group.

An example of a group is df.groupby(['race', 'tyre', 'stint']).get_group(("Australian Grand Prix", "Super soft", 1))

I would like to find out what is the percentage distribution of "time diff" values for each row of the group.

Her is the dataframe in dictionary format.There will be many other groups, but below df only shows the first group.

{'driverRef': {0: 'vettel',
  1: 'raikkonen',
  2: 'rosberg',
  4: 'hamilton',
  6: 'ricciardo',
  7: 'alonso',
  14: 'haryanto'},
 'race': {0: 'Australian Grand Prix',
  1: 'Australian Grand Prix',
  2: 'Australian Grand Prix',
  4: 'Australian Grand Prix',
  6: 'Australian Grand Prix',
  7: 'Australian Grand Prix',
  14: 'Australian Grand Prix'},
 'stint': {0: 1.0, 1: 1.0, 2: 1.0, 4: 1.0, 6: 1.0, 7: 1.0, 14: 1.0},
 'total diff': {0: 125147.50728499777,
  1: 281292.0366694695,
  2: 166278.41312954266,
  4: 64044.234019635056,
  6: 648383.28046950256,
  7: 400675.77449897071,
  14: 2846411.2560531585},
 'tyre': {0: u'Super soft',
  1: u'Super soft',
  2: u'Super soft',
  4: u'Super soft',
  6: u'Super soft',
  7: u'Super soft',
  14: u'Super soft'}}
doyz
  • 887
  • 2
  • 18
  • 43

1 Answers1

0

If I understand correctly what you need, this might help:

sums = df.groupby(['race', 'tyre', 'stint'])['total diff'].sum()
df = df.set_index(['race', 'tyre', 'stint']).assign(pct=sums).reset_index()
df['pct'] = df['total diff'] / df['pct']

#                     race        tyre  stint  driverRef    total diff       pct
# 0  Australian Grand Prix  Super soft    1.0     vettel  1.251475e+05  0.027613
# 1  Australian Grand Prix  Super soft    1.0  raikkonen  2.812920e+05  0.062065
# 2  Australian Grand Prix  Super soft    1.0    rosberg  1.662784e+05  0.036688
# 3  Australian Grand Prix  Super soft    1.0   hamilton  6.404423e+04  0.014131
# 4  Australian Grand Prix  Super soft    1.0  ricciardo  6.483833e+05  0.143060
# 5  Australian Grand Prix  Super soft    1.0     alonso  4.006758e+05  0.088406
# 6  Australian Grand Prix  Super soft    1.0   haryanto  2.846411e+06  0.628037
jpp
  • 159,742
  • 34
  • 281
  • 339