0

I have a data-frame which looks like:

A         B       C
13.06   12.95   -0.11
92.56   104.63  12.07
116.49  219.27  102.78
272.11  487.26  215.15
300.11  780.75  480.64

There are like 1 million records.

I want to create a column D which is calcualted as below:

First value of column D will be 0 and then:

Col D3= =(D2+1)*C3/B3

Col D4= =(D3+1)*C4/B4

Column D present value depends on previous value.

Here is the result:

D
0
0.115358884
0.52281017
0.672397915
1.02955022

I can solve it using for loop and loc but its taking lot of time. Can I solve it in more effective pythonic way?

MAC
  • 1,345
  • 2
  • 30
  • 60

1 Answers1

1

Recursive calculations are not vectorisable, for improve performance is used numba:

from numba import jit

@jit(nopython=True)
def f(a, b, c):
    d = np.empty(a.shape)
    d[0] = 0
    for i in range(1, a.shape[0]):
        d[i] = (d[i-1] + 1) * c[i] / b[i]
    return d

df['D'] = f(df['A'].to_numpy(), df['B'].to_numpy(), df['C'].to_numpy())
print (df)
        A       B       C         D
0   13.06   12.95   -0.11  0.000000
1   92.56  104.63   12.07  0.115359
2  116.49  219.27  102.78  0.522810
3  272.11  487.26  215.15  0.672398
4  300.11  780.75  480.64  1.029550
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Giving error : TypingError: Failed in nopython mode pipeline (step: nopython frontend) non-precise type array(pyobject, 1d, C) [1] During: typing of argument at (3) File "", line 3: def calc(a,b,c): d = np.empty(a.shape) ^ – MAC Jun 09 '20 at 09:19
  • @MAC - Are columns filled by numeric? – jezrael Jun 09 '20 at 09:20
  • df = {'HW':['13.06', '92.56', '116.49', '272.11','300.11'], 'IBC':[12.95, 104.63, 219.27, 487.26,780.75],'jik':[-0.11, 12.07, 102.78, 215.15,480.64]} df = pd.DataFrame(df) – MAC Jun 09 '20 at 09:22
  • 1
    @MAC - exactly, problem is `HW` is filled by strings, not numeric. Need convert it to numbers, check [this](https://stackoverflow.com/questions/15891038/change-data-type-of-columns-in-pandas) – jezrael Jun 09 '20 at 09:23