3

So, I have the following Pandas DataFrame where all values in third column (Ratio) are the same:

import pandas as pd 

df = pd.DataFrame([[2, 10, 0.5], 
                   [float('NaN'), 10, 0.5], 
                   [float('NaN'), 5, 0.5]], columns=['Col1', 'Col2', 'Ratio'])
╔══════╦══════╦═══════╗
║ Col1 ║ Col2 ║ Ratio ║
╠══════╬══════╬═══════╣
║ 2    ║   10 ║ 0.5   ║
║ NaN  ║   10 ║ 0.5   ║
║ NaN  ║    5 ║ 0.5   ║
╚══════╩══════╩═══════╝

I want to know if there is a way to multiply Col1 * Ratio and then the output of that product add it to Col2 and append the value to next row Col1 using a function provided by pandas.

Output example:

╔══════╦══════╦═══════╗
║ Col1 ║ Col2 ║ Ratio ║
╠══════╬══════╬═══════╣
║ 2    ║   10 ║ 0.5   ║
║ 11   ║   10 ║ 0.5   ║
║ 15.5 ║    5 ║ 0.5   ║
╚══════╩══════╩═══════╝
yatu
  • 86,083
  • 12
  • 84
  • 139
Snedecor
  • 689
  • 1
  • 6
  • 14
  • better use for loop – BENY Mar 04 '20 at 15:05
  • @YOBEN_S that's what I want to avoid, if posible. – Snedecor Mar 04 '20 at 15:06
  • 3
    I'm not sure you can, as in your operation each row depends on the result of the previous row... so they must be executed in order. (maybe you can avoid an explicit loop using `apply` or something, but that just loops under the hood...) – Adam.Er8 Mar 04 '20 at 15:11
  • and ratio varies across the rows? – Quang Hoang Mar 04 '20 at 15:11
  • @QuangHoang Nope, ratio stay the same in all rows. – Snedecor Mar 04 '20 at 15:13
  • I think you're looking for window functions in pandas. I think [this](https://stackoverflow.com/questions/38878917/how-to-invoke-pandas-rolling-apply-with-parameters-from-multiple-column) Stack Overflow question might point you in the right direction – Caleb McNevin Mar 04 '20 at 15:28
  • @Adam.Er8 What you said is very similar to `ewm`, which is vectorizable. However, it might not worth implement such a solution here. – Quang Hoang Mar 04 '20 at 15:33

1 Answers1

3

I think numba is way how working with loops here if performance is important:

from numba import jit

@jit(nopython=True)
def f(a, b, c):
    for i in range(1, a.shape[0]):
        a[i] = a[i-1] * c[i-1] + b[i-1]
    return a

df['Col1'] = f(df['Col1'].to_numpy(), df['Col2'].to_numpy(), df['Ratio'].to_numpy())
print (df)
   Col1  Col2  Ratio
0   2.0    10    0.5
1  11.0    10    0.5
2  15.5     5    0.5
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 1
    So, how it seems there is not pandas built-in function to acchieve the goal of the question, I've marked this as the correct answer because at least it is focused on performance. – Snedecor Mar 05 '20 at 12:26