1

The original dataset is:

Group Year Value
A 1990 NaN
A 1992 1
A 1995 NaN
A 1997 NaN
A 1998 NaN
A 2001 NaN
A 2002 1
B 1991 1
B 1992 NaN
B 1995 NaN
B 1998 NaN
B 2001 1
B 2002 NaN

I want to do forward fill by group and conditional on the value of column 'Year': forward fill missing value until the 'Year' is more than five years apart. For example, the value for group A in Year 1992 is 1, so the value for group A in 1995 should be forward filled with 1 since 1995-1992=3 <= 5; and the value for group A in 1997 should be forward filled with 1 since 1995-1992=3 <= 5; and the value for group A in 1998 should not be forward filled with 1 since 1998-1992=6 > 5.

The dataset I want is as follows:

Group Year Value
A 1990 NaN
A 1992 1
A 1995 1
A 1997 1
A 1998 NaN
A 2001 NaN
A 2002 1
B 1991 1
B 1992 1
B 1995 1
B 1998 NaN
B 2001 1
B 2002 1
mozway
  • 194,879
  • 13
  • 39
  • 75
leilei
  • 37
  • 5

2 Answers2

2

You can use a double groupby.ffill and mask with where:

# identify rows within 5 of the previous non-NA value
m = (df['Year'].where(df['Value'].notna())
     .groupby(df['Group']).ffill()
     .rsub(df['Year']).le(5)
    )

# groupby.ffill and mask
df['Value'] = df.groupby('Group')['Value'].ffill().where(m)

Output:

   Group  Year  Value
0      A  1990    NaN
1      A  1992    1.0
2      A  1995    1.0
3      A  1997    1.0
4      A  1998    NaN
5      A  2001    NaN
6      A  2002    1.0
7      B  1991    1.0
8      B  1992    1.0
9      B  1995    1.0
10     B  1998    NaN
11     B  2001    1.0
12     B  2002    1.0
mozway
  • 194,879
  • 13
  • 39
  • 75
0

You can use cummax to identify 5-year ranges:

x = df['Year'].mask(df['Value'].notna(), df['Year'] + 5).groupby(df['Group']).cummax()

df['Value'] = df.groupby([df['Group'], x])['Value'].ffill()

Result:

   Group  Year  Value
0      A  1990    NaN
1      A  1992    1.0
2      A  1995    1.0
3      A  1997    1.0
4      A  1998    NaN
5      A  2001    NaN
6      A  2002    1.0
7      B  1991    1.0
8      B  1992    1.0
9      B  1995    1.0
10     B  1998    NaN
11     B  2001    1.0
12     B  2002    1.0
Mykola Zotko
  • 15,583
  • 3
  • 71
  • 73