1

I have a DataFrame with 2 columns, a and b, and I would like to populate a third column, c based on the following three conditions:

  • if a.diff() > 0 then c = b.shift() + b
  • elif a.diff() < 0 then c = b.shift() - b
  • elif a.diff() == 0 then c = b.shift()

What is a Pythonic, one-liner way of doing this?

Example:

     a    b    c    
0    2   10  Nan
1    3   16   26
2    1   12    4
3    1   18   12
4    3   11   29
5    1   13   -2
KOB
  • 4,084
  • 9
  • 44
  • 88
  • Check : https://stackoverflow.com/questions/39405628/how-do-i-create-a-new-column-based-on-multiple-conditions-from-multiple-columns – Sruthi Feb 16 '18 at 15:38

1 Answers1

2

Use numpy.select and cache shifted and diffed Series for better performance and readibility:

diff = df.a.diff()
shifted = df.b.shift()

df['c'] = np.select([diff > 0, diff < 0], [shifted + df.b, shifted - df.b], default=shifted)
print (df)
   a   b     c
0  2  10   NaN
1  3  16  26.0
2  1  12   4.0
3  1  18  12.0
4  3  11  29.0
5  1  13  -2.0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252