Pandas fill missing values with groupby

Question

I have a table of various indicators grouped by Date and Code. I am trying to fill missing values with the previous day's data OR if not available - with the next day's data for each Code.

The problem is when I group by 'Code' and 'Date', nothing happens

df = pd.DataFrame([['2019-05-01', 'APL', 15951, 303, 49],  
['2019-05-02', 'APL', 16075, 301, 46],  
['2019-05-03', 'APL', np.nan, 300, 45],  
['2019-05-04', 'APL', 15868, 298.8, 33],  
['2019-05-01', 'MSK', 2222, np.nan, np.nan],  
['2019-05-02', 'MSK', 2224, 243, 53],  
['2019-05-03', 'MSK', 2266, 233, 33],  
['2019-05-04', 'MSK', np.nan, 253, 55]],  
columns=['Date', 'Code', 'Price', 'Volume', 'ATM'])

Here is what I am trying:

df.groupby(['Code','Date'])['Price','Volume', 'ATM'].fillna(method = 'ffill')

harvpan · Accepted Answer · 2019-08-05T17:00:54.263

3

You need:

df.groupby(['Code']).apply(lambda x: x.ffill().bfill())

Output:

    Code      Date  Price   Volume  ATM
0   APL 2019-05-01  15951.0 303.0   49.0
1   APL 2019-05-02  16075.0 301.0   46.0
2   APL 2019-05-03  16075.0 300.0   45.0
3   APL 2019-05-04  15868.0 298.8   33.0
4   MSK 2019-05-01  2222.0  243.0   53.0
5   MSK 2019-05-02  2224.0  243.0   53.0
6   MSK 2019-05-03  2266.0  233.0   33.0
7   MSK 2019-05-04  2266.0  253.0   55.0

If you groupby ['Date', 'Code'], each day becomes a group and within that group, there will not be any missing values.

edited Aug 05 '19 at 17:00

answered Aug 05 '19 at 16:13

harvpan

8,571
2
18
36

when you chain two functions , you need adding the apply here – BENY Aug 05 '19 at 16:25
@WeNYoBen how come? I have seen chaining done without apply, many times. – harvpan Aug 05 '19 at 16:31
Does this return `Code` column also? – Josmoor98 Aug 05 '19 at 16:37
@Josmoor98 yes. Ofcourse, as you can see in the output – harvpan Aug 05 '19 at 16:38
Using your solution, the `Code` columns is omitted from the result – Josmoor98 Aug 05 '19 at 16:39
@Josmoor98, it is not. I have ran the code and `Code` is in the output. Did you run the code? :( – harvpan Aug 05 '19 at 16:41
Really, that's odd, for me `Code` is missing. Using pandas `0.22.0` in a jupyter notebook FYI – Josmoor98 Aug 05 '19 at 16:44
@harvpan edge situation try groupby with id , using without apply `df=pd.DataFrame({'id':[1,1,1,1,2,2,2,2],'v':[np.nan,np.nan,np.nan,np.nan,999,999,999,999]}) ` – BENY Aug 05 '19 at 16:45
1

thanks @harvpan , moving 'Date' away from groupby was the key! – Polto Aug 05 '19 at 16:45
@Josmoor98 hmm.. try `df.groupby(['Code'], as_index=False)['Date', 'Price','Volume', 'ATM'].ffill().bfill()` – harvpan Aug 05 '19 at 16:46
Still getting the same issue. I only get your output using `df.groupby(['Code']).ffill().bfill()`. With `['Date', 'Price','Volume', 'ATM']` aren't you slicing on those columns? – Josmoor98 Aug 05 '19 at 16:54
@harvpan: Read this the comment @jerael replied to me on exact issue of chaining `.ffill.bfill` with `groupby`: https://stackoverflow.com/questions/46391128/pandas-fillna-using-groupby/46391144?noredirect=1#comment98276167_46391144 – Andy L. Aug 05 '19 at 16:54
@WeNYoBen Didn't realize about this. thanks. Updated the answer. – harvpan Aug 05 '19 at 17:01
1

@AndyL. thanks for collaborative comments. This really clears out why apply is best for chaining – harvpan Aug 05 '19 at 17:02

score 1 · Answer 2 · answered Aug 05 '19 at 16:14

1

Here is what you can do:

df.set_index(['Date', 'Code'], inplace=True)
df['Price'].fillna(method='ffill', inplace=True)
df

answered Aug 05 '19 at 16:14

Asad Rauf

743
9
17

score 0 · Answer 3 · answered Aug 05 '19 at 17:07

0

To apply to specific columns, I use.

for col in list_of_cols:  # Replace with your list of column name
    df[col] = df.groupby('Code')[col].transform(lambda x: x.ffill().bfill())

answered Aug 05 '19 at 17:07

Josmoor98

1,721
10
27

Pandas fill missing values with groupby

3 Answers3

Linked

Related