Pandas group by to compute new columns from old

Question

I am trying to compute calculations across groups... Actually that's not a very good explanation. Let me show some code:

import pandas as pd
df = pd.DataFrame({"state": ["ma", "ny", "dc", "ma", "ny", "dc", "ma", "ny", "dc", "dc"], "v": [1,2,3,2,1,2,3,4,1,10], "w": [1,1,1,1,1,1,1,1,1,10]})
print(df)

outputs:

  state   v   w
0    ma   1   1
1    ny   2   1
2    dc   3   1
3    ma   2   1
4    ny   1   1
5    dc   2   1
6    ma   3   1
7    ny   4   1
8    dc   1   1
9    dc  10  10

I would like to do the same calculations for each state.

 df.assign(diffv=df.groupby('state')['v'].diff())

I will a new column diffv where each observation is the change of v from the previous observation for the same state.

  state   v   w  diffv
0    ma   1   1    NaN
1    ny   2   1    NaN
2    dc   3   1    NaN
3    ma   2   1    1.0
4    ny   1   1   -1.0
5    dc   2   1   -1.0
6    ma   3   1    1.0
7    ny   4   1    3.0
8    dc   1   1   -1.0
9    dc  10  10    9.0

Now I've written a function doubles(Series) which given a series produces a new Series where each entry is a number which says how far back in that series you would have to go to find a number that's half as much. That is, how quickly did it double. So doubles works something like this. You can argue with the fractions but that's the idea.

v  dbl
1, NaN
2, 1
3, 1.5
4, 2
5, 2.5
6, 3
7, 3.5

I would like to use it just like the way I use diff():

 df.assign(diffv=df.groupby('state')['v'].doubles())

That won't work of course, but I feel I am close!

Funny. I know I tried "it" before, and it didn't work but I wrote this question from memory! Should I just delete the question or do you want to officially answer it? — pitosalas, Apr 20 '20 at 18:20
Sorry @Ben.t I updated the question to the deeper question I was working towards so your comment now doesn't make sense anymore. That's my fault. — pitosalas, Apr 21 '20 at 01:28
because the function `double` is not an implemented function of pandas, then you need a slightly different syntax, try `df.groupby('state')['v'].apply(doubles)` or `df.groupby('state')['v'].apply(lambda x: doubles(x))` — Ben.T, Apr 21 '20 at 12:09
[this](https://stackoverflow.com/questions/15374597/apply-function-to-pandas-groupby) is not exactly a dup, but I think it may be interesting! — Ben.T, Apr 21 '20 at 12:23

Pandas group by to compute new columns from old

0 Answers0