1

I have a table of various indicators grouped by Date and Code. I am trying to fill missing values with the previous day's data OR if not available - with the next day's data for each Code.

The problem is when I group by 'Code' and 'Date', nothing happens

df = pd.DataFrame([['2019-05-01', 'APL', 15951, 303, 49],  
['2019-05-02', 'APL', 16075, 301, 46],  
['2019-05-03', 'APL', np.nan, 300, 45],  
['2019-05-04', 'APL', 15868, 298.8, 33],  
['2019-05-01', 'MSK', 2222, np.nan, np.nan],  
['2019-05-02', 'MSK', 2224, 243, 53],  
['2019-05-03', 'MSK', 2266, 233, 33],  
['2019-05-04', 'MSK', np.nan, 253, 55]],  
columns=['Date', 'Code', 'Price', 'Volume', 'ATM'])

Here is what I am trying:

df.groupby(['Code','Date'])['Price','Volume', 'ATM'].fillna(method = 'ffill')
Mouse on the Keys
  • 322
  • 1
  • 5
  • 13
Polto
  • 95
  • 1
  • 10

3 Answers3

3

You need:

df.groupby(['Code']).apply(lambda x: x.ffill().bfill())

Output:

    Code      Date  Price   Volume  ATM
0   APL 2019-05-01  15951.0 303.0   49.0
1   APL 2019-05-02  16075.0 301.0   46.0
2   APL 2019-05-03  16075.0 300.0   45.0
3   APL 2019-05-04  15868.0 298.8   33.0
4   MSK 2019-05-01  2222.0  243.0   53.0
5   MSK 2019-05-02  2224.0  243.0   53.0
6   MSK 2019-05-03  2266.0  233.0   33.0
7   MSK 2019-05-04  2266.0  253.0   55.0

If you groupby ['Date', 'Code'], each day becomes a group and within that group, there will not be any missing values.

harvpan
  • 8,571
  • 2
  • 18
  • 36
  • when you chain two functions , you need adding the apply here – BENY Aug 05 '19 at 16:25
  • @WeNYoBen how come? I have seen chaining done without apply, many times. – harvpan Aug 05 '19 at 16:31
  • Does this return `Code` column also? – Josmoor98 Aug 05 '19 at 16:37
  • @Josmoor98 yes. Ofcourse, as you can see in the output – harvpan Aug 05 '19 at 16:38
  • Using your solution, the `Code` columns is omitted from the result – Josmoor98 Aug 05 '19 at 16:39
  • @Josmoor98, it is not. I have ran the code and `Code` is in the output. Did you run the code? :( – harvpan Aug 05 '19 at 16:41
  • Really, that's odd, for me `Code` is missing. Using pandas `0.22.0` in a jupyter notebook FYI – Josmoor98 Aug 05 '19 at 16:44
  • @harvpan edge situation try groupby with id , using without apply `df=pd.DataFrame({'id':[1,1,1,1,2,2,2,2],'v':[np.nan,np.nan,np.nan,np.nan,999,999,999,999]}) ` – BENY Aug 05 '19 at 16:45
  • 1
    thanks @harvpan , moving 'Date' away from groupby was the key! – Polto Aug 05 '19 at 16:45
  • @Josmoor98 hmm.. try `df.groupby(['Code'], as_index=False)['Date', 'Price','Volume', 'ATM'].ffill().bfill()` – harvpan Aug 05 '19 at 16:46
  • Still getting the same issue. I only get your output using `df.groupby(['Code']).ffill().bfill()`. With `['Date', 'Price','Volume', 'ATM']` aren't you slicing on those columns? – Josmoor98 Aug 05 '19 at 16:54
  • @harvpan: Read this the comment @jerael replied to me on exact issue of chaining `.ffill.bfill` with `groupby`: https://stackoverflow.com/questions/46391128/pandas-fillna-using-groupby/46391144?noredirect=1#comment98276167_46391144 – Andy L. Aug 05 '19 at 16:54
  • @WeNYoBen Didn't realize about this. thanks. Updated the answer. – harvpan Aug 05 '19 at 17:01
  • 1
    @AndyL. thanks for collaborative comments. This really clears out why apply is best for chaining – harvpan Aug 05 '19 at 17:02
1

Here is what you can do:

df.set_index(['Date', 'Code'], inplace=True)
df['Price'].fillna(method='ffill', inplace=True)
df

enter image description here

Asad Rauf
  • 743
  • 9
  • 17
0

To apply to specific columns, I use.

for col in list_of_cols:  # Replace with your list of column name
    df[col] = df.groupby('Code')[col].transform(lambda x: x.ffill().bfill())
Josmoor98
  • 1,721
  • 10
  • 27