1

I have a pandas dataframe, which the following command works on:

house.groupby(['place_name'])['index_nsa'].agg(['first','last'])

It gives me what I want. Now I want to make a custom aggregation value that gives me the percentage change between the first and the last value.

I got an error for doing math on the values, so I assumed that I have to turn them into numbers.

house.groupby(['place_name'])['index_nsa'].agg({"change in %":[(int('last')-int('first')/int('first')]})

Unfortunately, I only get a syntax error on the last bracket, which I cannot seem to find the error.

Does someone see where I went wrong ?

cs95
  • 379,657
  • 97
  • 704
  • 746
hmmmbob
  • 1,167
  • 5
  • 19
  • 33

1 Answers1

2

You will need to define and pass a callback to agg here. You can do that in-line with a lambda function:

house.groupby(['place_name'])['index_nsa'].agg([
    ("change in %", lambda x: (x.iloc[-1] - x.iloc[0]) / x.iloc[0])])

Look closely at .agg call—to allow renaming the output column, you must pass a list of tuples of the format [(new_name, agg_func), ...]. More info here.

If you want to avoid the lambda at the cost of some verbosity, you may use

def first_last_pct(ser):
    first, last = ser.iloc[0], ser.iloc[-1]
    return (last - first) / first

house.groupby(['place_name'])['index_nsa'].agg([("change in %", first_last_pct)])
cs95
  • 379,657
  • 97
  • 704
  • 746