6

I've looked at the following question:

Apply multiple functions to multiple groupby columns

and I have data along the lines of

                    p.date p.instrument                p.sector  \
11372  2013-02-15 00:00:00            A             Health Care   
11373  2013-02-15 00:00:00           AA               Materials   
11374  2013-02-15 00:00:00         AAPL  Information Technology   
11375  2013-02-15 00:00:00         ABBV             Health Care   
11376  2013-02-15 00:00:00          ABC             Health Care   

                                p.industry    p.retn  p.pfwt     b.bwt  
11372     Health Care Equipment & Services -5.232929     NaN  0.000832  
11373                             Aluminum  0.328947     NaN  0.000907  
11374                    Computer Hardware -1.373927     NaN  0.031137  
11375                      Pharmaceuticals  2.756020     NaN  0.004738  
11376  Health Care Distribution & Services -0.371179     NaN  0.000859 

but when I try:

test1.groupby("p.sector").agg({'r1': lambda x: x['p.pfwt'].sum()})

I get the error

KeyError: 'r1'

I'm trying to create new columns with a set of results from the current DataFrame.

What am I missing? Thanks

smci
  • 32,567
  • 20
  • 113
  • 146
Tahnoon Pasha
  • 5,848
  • 14
  • 49
  • 75
  • 2
    keys in the aggregation dictionary must correspond with preexisting keys in the dataframe. There is no 'r1' column in your dataframe, so you can not aggregate something that doesnt exists – joaquin Nov 22 '14 at 11:21

1 Answers1

6

use

test1.groupby("p.sector").agg({'p.pfwt': np.sum})

see this pandas docs for example.

  • Keys in the aggregation dictionary must correspond with preexisting keys in the dataframe. Your program fails because there is no 'r1' column in your dataframe, so it can not aggregate something that doesnt exist.
  • If you need to rename the result, then you can add in a chained operation for a Series like this: .agg([np.sum, np.mean, np.std]).rename(columns={'sum': 'foo', 'mean': 'bar', 'std': 'baz'}) )
smci
  • 32,567
  • 20
  • 113
  • 146
joaquin
  • 82,968
  • 29
  • 138
  • 152
  • Thanks @joaquin I'd like to add new columns in the results data frame that represent calculated information from the source data frame.is there any way to do that? – Tahnoon Pasha Nov 22 '14 at 11:35
  • the new information is aggregated in p.pfwt. If you do not like the name you can change it after aggregation. In any case p.pfwt original data will be lost because you can not keep the original information after aggregation (at least no without additional processing) – joaquin Nov 22 '14 at 13:50