1

I'm wondering if I'm able to do the following in one line or if it's necessary to do it in two (I'm coming from R where I know how to do it in one call). I want to compute the batting average which requires manipulation of both the hits and the at bats columns

import pandas as pd

batting = pd.DataFrame({'playerID': [1, 1, 1, 2, 2, 2],
                        'h': [80, 97, 95, 30, 35, 22],
                        'ab': [400, 410, 390, 150, 170, 145]})

batters = (batting.groupby('playerID')
                  .agg({'h' : 'sum', 'ab' : 'sum'})
                  .reset_index())

batters['ba'] = batters['h']/batters['ab']
ALollz
  • 57,915
  • 7
  • 66
  • 89
Ben G
  • 4,148
  • 2
  • 22
  • 42

1 Answers1

4

eval is your friend.

(batting.groupby('playerID')
        .agg({'h' : 'sum', 'ab' : 'sum'})
        .reset_index()
        .eval('ba = h / ab'))

   playerID    h    ab        ba
0         1  272  1200  0.226667
1         2   87   465  0.187097

You can shorten this to,

batting.groupby('playerID', as_index=False).sum().eval('ba = h / ab')

   playerID    h    ab        ba
0         1  272  1200  0.226667
1         2   87   465  0.187097
cs95
  • 379,657
  • 97
  • 704
  • 746