1

I need to fill the missing date down by group. Here is the code to create the data frame. i want to add the date of the fill column down only as far as the when the date of the fill column changes and only until the group 'name' changes.

    data = {'tdate': [20080815,20080915,20081226,20090110,20090131,20080807,20080831,
    20080918,20081023,20081114,20081207,20090117,20090203,20090219,20090305,20090318,20090501],
        'name': ['A','A','A','A','A','B','B','B','B','B','B','B','B','B','B','B','B'],
    'fill': [NaN,NaN,20080915,NaN,NaN,NaN,NaN,NaN,NaN,20081023,
             NaN,NaN,NaN,NaN,20090219,NaN,NaN]}

    df = pd.DataFrame(data, columns=['tdate', 'name', 'fill'])
    df

Current data frame

tdate   name    fill
0    20080815    A   NaN
1    20080915    A   NaN
2    20081226    A   20080915
3    20090110    A   NaN
4    20090131    A   NaN
5    20080807    B   NaN
6    20080831    B   NaN
7    20080918    B   NaN
8    20081023    B   NaN
9    20081114    B   20081023
10   20081207    B   NaN
11   20090117    B   NaN
12   20090203    B   NaN
13   20090219    B   NaN
14   20090305    B   20090219
15   20090318    B   NaN
16   20090501    B   NaN

Desired output

    tdate   name    fill
0    20080815    A   NaN
1    20080915    A   NaN
2    20081226    A   20080915
3    20090110    A   20080915
4    20090131    A   20080915
5    20080807    B   NaN
6    20080831    B   NaN
7    20080918    B   NaN
8    20081023    B   NaN
9    20081114    B   NaN
10   20081207    B   20081023
11   20090117    B   20081023
12   20090203    B   20081023
13   20090219    B   20081023
14   20090305    B   20081023
15   20090318    B   20090219
16   20090501    B   20090219

Here is my code

df.groupby(df["name"])["fill"].fill()
Erich
  • 899
  • 2
  • 10
  • 26

1 Answers1

2

You were pretty close, you just need to forward-fill rather than just filling:

df.groupby('name')["fill"].ffill()
Out[42]: 
0          NaN
1          NaN
2     20080915
3     20080915
4     20080915
5          NaN
6          NaN
7          NaN
8          NaN
9     20081023
10    20081023
11    20081023
12    20081023
13    20081023
14    20090219
15    20090219
16    20090219
dtype: float64

Or equivalently:

df.groupby('name')["fill"].fillna(method='ffill')
Marius
  • 58,213
  • 16
  • 107
  • 105