pandas data manipulation in python

Question

I have a data frame df with columns ID and N1, I would like to calculate column N2, with logic first value should be equal to N1 for each ID and the next value is 0.888/0.999 and so on. And similarly for next ID. Can we do this WITHOUT using for loop in pandas

ID  N1  N2
1111    0.999   0.999
1111    0.888   0.888888889
1111    0.777   0.875
1111    0.666   0.857142857
1111    0.555   0.833333333
1111    0.444   0.8
1111    0.333   0.75
2222    0.998   0.998
2222    0.887   0.888777555
2222    0.776   0.874859076
2222    0.665   0.856958763
2222    0.554   0.833082707
2222    0.443   0.799638989
2222    0.332   0.749435666
2222    0.221   0.665662651

score 5 · Answer 1 · answered Apr 05 '17 at 20:22

5

This is 1 plus the percentage change

df.assign(N2=df.groupby('ID').N1.pct_change().add(1).fillna(df.N1))

      ID     N1        N2
0   1111  0.999  0.999000
1   1111  0.888  0.888889
2   1111  0.777  0.875000
3   1111  0.666  0.857143
4   1111  0.555  0.833333
5   1111  0.444  0.800000
6   1111  0.333  0.750000
7   2222  0.998  0.998000
8   2222  0.887  0.888778
9   2222  0.776  0.874859
10  2222  0.665  0.856959
11  2222  0.554  0.833083
12  2222  0.443  0.799639
13  2222  0.332  0.749436
14  2222  0.221  0.665663

answered Apr 05 '17 at 20:22

piRSquared

285,575
57
475
624

1

`1 plus the percentage change` - very clever ! – MaxU - stand with Ukraine Apr 05 '17 at 20:26
1

@MaxU wha't clever is you / Jeff in [**this post**](http://stackoverflow.com/a/41784854/2336654) then what I did with that [**here**](http://stackoverflow.com/a/43239617/2336654) – piRSquared Apr 05 '17 at 20:28
1

thanks for the link - i didn't see your brilliant answer ;-) – MaxU - stand with Ukraine Apr 05 '17 at 20:31
2

@MaxU thx.. I was proud of that one, will save it for later :-) – piRSquared Apr 05 '17 at 20:34

score 3 · Accepted Answer · answered Apr 05 '17 at 20:15

3

Yes, you can use groupby(), transform() and shift() then fillna(1) to allow for that first value.

df['N2'] = df.groupby("ID")['N1'].transform(lambda x: x/x.shift(1).fillna(1))
df

      ID     N1        N2
0   1111  0.999  0.999000
1   1111  0.888  0.888889
2   1111  0.777  0.875000
3   1111  0.666  0.857143
4   1111  0.555  0.833333
5   1111  0.444  0.800000
6   1111  0.333  0.750000
7   2222  0.998  0.998000
8   2222  0.887  0.888778
9   2222  0.776  0.874859
10  2222  0.665  0.856959
11  2222  0.554  0.833083
12  2222  0.443  0.799639
13  2222  0.332  0.749436
14  2222  0.221  0.665663

answered Apr 05 '17 at 20:15

Scott Boston

147,308
15
139
187

This one takes less time when you deal with large data set. So accepted this answer. – BigDataScientist Apr 06 '17 at 13:11
This is very slow on large data frame, is there any NumPy solution, which can be faster – BigDataScientist Apr 13 '17 at 13:34
@user2684128 You might repost the problem with an emphasis on using NumPy. – Scott Boston Apr 13 '17 at 13:54

pandas data manipulation in python

2 Answers2