Pandas dataframe forward-fill with decay

Question

I am running Python 3.5, and Pandas v 0.19.2. I have a dataframe like below. Forward-filling the missing values is straight-forward.

import pandas as pd
import numpy as np

d = {'A': np.array([10, np.nan, np.nan, -3, np.nan, 4, np.nan, 0]),
     'B': np.array([np.nan, np.nan, 5, -3, np.nan, np.nan, 0, np.nan ])}
df = pd.DataFrame(d)
df_filled = df.fillna(axis='index', method='ffill')
print(df_filled)
Out[8]: 
      A    B
0  10.0  NaN
1  10.0  NaN
2  10.0  5.0
3  -3.0 -3.0
4  -3.0 -3.0
5   4.0 -3.0
6   4.0  0.0
7   0.0  0.0

My question is: what is the best way to implement a forward fill with decay? I understand the pd.ffill() and pd.fillna() do not support this. For instance, the output I am after is the below (in contrast with the regular ffill above), where the value carried over halves at each period:

Out[5]: 
      A    B
0  10.0  NaN
1   5.0  NaN
2   2.5  5.0
3  -3.0 -3.0
4  -1.5 -1.5
5   4.0 -0.75
6   2.0  0.0
7   0.0  0.0

Hi, how is decay supposed to work here? Just half of previous values? — rpanai, Jun 22 '18 at 16:36
Yes, as an example just half the previous value. I have added the regular `ffill()` output and the desired for comparison. — Zhubarb, Jun 22 '18 at 16:39
Anyway you could ask to add this as a feature on [github](https://github.com/pandas-dev/pandas). — rpanai, Jun 22 '18 at 18:35

cs95 · Accepted Answer · 2018-06-22T16:51:17.073

5

Yes, there's no simple way to do this. I'd recommend doing this one column at a time, using groupby and apply.

for c in df:
    df[c] = df[c].groupby(df[c].notnull().cumsum()).apply(
        lambda y: y.ffill() / 2 ** np.arange(len(y))
    )

df
      A     B
0  10.0   NaN
1   5.0   NaN
2   2.5  5.00
3  -3.0 -3.00
4  -1.5 -1.50
5   4.0 -0.75
6   2.0  0.00
7   0.0  0.00

edited Jun 22 '18 at 16:51

answered Jun 22 '18 at 16:48

cs95

379,657
97
704
746

There is an extra bracket at the end and it returns this error `NameError: name 'x' is not defined` – rpanai Jun 22 '18 at 16:50
I think you should ask to add this to pandas as a feature, very useful. – kevin_theinfinityfund Nov 30 '22 at 20:33

score 3 · Answer 2 · answered Jun 22 '18 at 17:55

There is a vectorial solution. It partly uses this answer

import pandas as pd
import numpy as np

d = {'A': np.array([10, np.nan, np.nan, -3, np.nan, 4, np.nan, 0]),
     'B': np.array([np.nan, np.nan, 5, -3, np.nan, np.nan, 0, np.nan ])}
df = pd.DataFrame(d)

decay_rate = 2

ddf = df.isnull().cumsum().diff().fillna(0)
ddf = ddf!=0
ddf = ddf.cumsum() - ddf.cumsum()\
                        .where(~ddf)\
                        .ffill()\
                        .fillna(0)
df_filled = df.ffill()/(ddf * decay_rate).replace(0, 1)

Edit: In my experiments this solution is 1.8x faster than the other. It should be interesting to compare the results with the full df.

When I use a non-trivial dataframe with values 0-0.2 the halving seems to break after some N rows. For example on a dataframe with 7 rows I notice the halving stops working around the 3-4th index. — kevin_theinfinityfund, Nov 30 '22 at 19:23

Pandas dataframe forward-fill with decay

2 Answers2