7

I have dataframe as such:

df = pd.DataFrame({'val': [np.nan,np.nan,np.nan,np.nan, 15, 1, 5, 2,np.nan, np.nan, np.nan, np.nan,np.nan,np.nan,2,23,5,12, np.nan np.nan, 3,4,5]})
df['name'] = ['a']*8 + ['b']*15

df

>>> 
    val name
0   NaN    a
1   NaN    a
2   NaN    a
3   NaN    a
4   15.0   a
5   1.0    a
6   5.0    a
7   2.0    a
8   NaN    b
9   NaN    b
10  NaN    b
11  NaN    b
12  NaN    b
13  NaN    b
14  2.0    b
15  23.0   b
16  5.0    b
17  12.0   b
18  NaN    b
19  NaN    b
20  3.0    b
21  4.0    b
22  5.0    b

For each name i want to backfill the prior 3 na spots with -1 so that I end up with

>>>
    val name
0   NaN     a
1   -1.0    a
2   -1.0    a
3   -1.0    a
4   15.0    a
5   1.0     a
6   5.0     a
7   2.0     a
8   NaN     b
9   NaN     b
10  NaN     b
11  -1.0    b
12  -1.0    b
13  -1.0    b
14  2.0     b
15  23.0    b
16  5.0     b
17  12.0    b
18  -1      b
19  -1      b
20  3.0     b
21  4.0     b
22  5.0     b

Note there can be multiple sections with NaN. If a section has less than 3 nans it will fill all of them (it backfills all up to 3).

RSHAP
  • 2,337
  • 3
  • 28
  • 39

1 Answers1

5

You can using first_valid_index, return the first not null value of each group then assign the -1 in by using the loc

idx=df.groupby('name').val.apply(lambda x : x.first_valid_index())
for x in idx:
    df.loc[x - 3:x - 1, 'val'] = -1

df
Out[51]: 
     val name
0    NaN    a
1   -1.0    a
2   -1.0    a
3   -1.0    a
4   15.0    a
5    1.0    a
6    5.0    a
7    2.0    a
8    NaN    b
9    NaN    b
10   NaN    b
11  -1.0    b
12  -1.0    b
13  -1.0    b
14   2.0    b
15  23.0    b
16   5.0    b
17  12.0    b

Update

s=df.groupby('name').val.bfill(limit=3)
s.loc[s.notnull()&df.val.isnull()]=-1
s
Out[59]: 
0      NaN
1     -1.0
2     -1.0
3     -1.0
4     15.0
5      1.0
6      5.0
7      2.0
8      NaN
9      NaN
10     NaN
11    -1.0
12    -1.0
13    -1.0
14     2.0
15    23.0
16     5.0
17    12.0
18     NaN
19    -1.0
20    -1.0
21    -1.0
22     3.0
23     4.0
24     5.0
Name: val, dtype: float64
BENY
  • 317,841
  • 20
  • 164
  • 234
  • I'm sorry I was not clear enough in my question. There can be multiple instances where there are streaks of nan for each name. See update to question. – RSHAP Jun 07 '18 at 21:12
  • @RSHAP check the update , you just need to assign s back to your val columns – BENY Jun 07 '18 at 21:17
  • 1
    Nice answer! How will this handle if there aren't 3 NaN's in a row? Will it overwrite another value? – user3483203 Jun 07 '18 at 21:22
  • good point @user3483203 I updated question so the second part of `b` only has 2 nans. I believe this solution would change the value at index 17 – RSHAP Jun 07 '18 at 21:27
  • 1
    @RSHAP it will not , since I bring two condition , previous one must be NaN and the current one is not NaN – BENY Jun 07 '18 at 21:28