How to replace NaNs by preceding or next values in pandas DataFrame?

Question

Suppose I have a DataFrame with some NaNs:

>>> import pandas as pd
>>> df = pd.DataFrame([[1, 2, 3], [4, None, None], [None, None, 9]])
>>> df
    0   1   2
0   1   2   3
1   4 NaN NaN
2 NaN NaN   9

What I need to do is replace every NaN with the first non-NaN value in the same column above it. It is assumed that the first row will never contain a NaN. So for the previous example the result would be

I can just loop through the whole DataFrame column-by-column, element-by-element and set the values directly, but is there an easy (optimally a loop-free) way of achieving this?

Alex Riley · Accepted Answer · 2015-01-12T15:38:35.513

393

You could use the fillna method on the DataFrame and specify the method as ffill (forward fill):

>>> df = pd.DataFrame([[1, 2, 3], [4, None, None], [None, None, 9]])
>>> df.fillna(method='ffill')
   0  1  2
0  1  2  3
1  4  2  3
2  4  2  9

This method...

propagate[s] last valid observation forward to next valid

To go the opposite way, there's also a bfill method.

This method doesn't modify the DataFrame inplace - you'll need to rebind the returned DataFrame to a variable or else specify inplace=True:

df.fillna(method='ffill', inplace=True)

edited Jan 12 '15 at 15:38

answered Jan 12 '15 at 15:25

Alex Riley

169,130
45
262
238

What if the blank cell was in the column names index (i.e., a couple of the columns didn't have names but did have data. Is there a way to use bfill or ffill to fill the blank column index cell with the cell in the row immediately below it? For instance: df = pd.DataFrame({'col1': [2, 4, 8], 'col2': [2, 0, 0], '': [10, 2, 1]}, index=['falcon', 'dog', 'spider'']) How could I use bfill or ffill to change the name of the third column to 10 (which is the value of the row immediately below the blank third column name? Thanks! – BGG16 Aug 03 '20 at 17:58

score 58 · Answer 2 · answered Jun 01 '17 at 22:53

The accepted answer is perfect. I had a related but slightly different situation where I had to fill in forward but only within groups. In case someone has the same need, know that fillna works on a DataFrameGroupBy object.

>>> example = pd.DataFrame({'number':[0,1,2,nan,4,nan,6,7,8,9],'name':list('aaabbbcccc')})
>>> example
  name  number
0    a     0.0
1    a     1.0
2    a     2.0
3    b     NaN
4    b     4.0
5    b     NaN
6    c     6.0
7    c     7.0
8    c     8.0
9    c     9.0
>>> example.groupby('name')['number'].fillna(method='ffill') # fill in row 5 but not row 3
0    0.0
1    1.0
2    2.0
3    NaN
4    4.0
5    4.0
6    6.0
7    7.0
8    8.0
9    9.0
Name: number, dtype: float64

exactly what I was looking for, ty – Tony Sep 22 '17 at 11:18 — Tony, Sep 22 '17 at 11:18

score 22 · Answer 3 · answered Mar 17 '17 at 04:47

22

One thing that I noticed when trying this solution is that if you have N/A at the start or the end of the array, ffill and bfill don't quite work. You need both.

In [224]: df = pd.DataFrame([None, 1, 2, 3, None, 4, 5, 6, None])

In [225]: df.ffill()
Out[225]:
     0
0  NaN
1  1.0
...
7  6.0
8  6.0

In [226]: df.bfill()
Out[226]:
     0
0  1.0
1  1.0
...
7  6.0
8  NaN

In [227]: df.bfill().ffill()
Out[227]:
     0
0  1.0
1  1.0
...
7  6.0
8  6.0

answered Mar 17 '17 at 04:47

jjs

321
2
4

Brilliant. I needed exactly this for my problem. Filling both before and after. Thanks a lot. – Prometheus Apr 22 '18 at 16:46
Great. I need this solution. Thanks – some_programmer Mar 09 '19 at 15:44

Ffisegydd · Answer 4 · 2015-01-12T15:32:01.573

You can use pandas.DataFrame.fillna with the method='ffill' option. 'ffill' stands for 'forward fill' and will propagate last valid observation forward. The alternative is 'bfill' which works the same way, but backwards.

import pandas as pd

df = pd.DataFrame([[1, 2, 3], [4, None, None], [None, None, 9]])
df = df.fillna(method='ffill')

print(df)
#   0  1  2
#0  1  2  3
#1  4  2  3
#2  4  2  9

There is also a direct synonym function for this, pandas.DataFrame.ffill, to make things simpler.

score 13 · Answer 5 · answered Dec 19 '18 at 07:45

13

Only one column version

Fill NAN with last valid value

df[column_name].fillna(method='ffill', inplace=True)

Fill NAN with next valid value

df[column_name].fillna(method='backfill', inplace=True)

answered Dec 19 '18 at 07:45

SpiralDev

7,011
5
28
42

How do fill values of multiple columns but not all? – Alex May 04 '22 at 11:53
@Alex check this, it should help: https://stackoverflow.com/questions/34913590/fillna-in-multiple-columns-in-place-in-python-pandas – Alex May 15 '23 at 10:25

Suvo · Answer 6 · 2019-12-22T03:07:56.560

Just agreeing with ffill method, but one extra info is that you can limit the forward fill with keyword argument limit.

>>> import pandas as pd    
>>> df = pd.DataFrame([[1, 2, 3], [None, None, 6], [None, None, 9]])

>>> df
     0    1   2
0  1.0  2.0   3
1  NaN  NaN   6
2  NaN  NaN   9

>>> df[1].fillna(method='ffill', inplace=True)
>>> df
     0    1    2
0  1.0  2.0    3
1  NaN  2.0    6
2  NaN  2.0    9

Now with limit keyword argument

>>> df[0].fillna(method='ffill', limit=1, inplace=True)

>>> df
     0    1  2
0  1.0  2.0  3
1  1.0  2.0  6
2  NaN  2.0  9

piRSquared · Answer 7 · 2017-03-31T00:57:37.117

9

ffill now has it's own method pd.DataFrame.ffill

df.ffill()

     0    1    2
0  1.0  2.0  3.0
1  4.0  2.0  3.0
2  4.0  2.0  9.0

edited Mar 31 '17 at 00:57

answered Mar 31 '17 at 00:50

piRSquared

285,575
57
475
624

score 9 · Answer 8 · answered Dec 22 '19 at 03:29

You can use fillna to remove or replace NaN values.

NaN Remove

import pandas as pd

df = pd.DataFrame([[1, 2, 3], [4, None, None], [None, None, 9]])

df.fillna(method='ffill')
     0    1    2
0  1.0  2.0  3.0
1  4.0  2.0  3.0
2  4.0  2.0  9.0

NaN Replace

df.fillna(0) # 0 means What Value you want to replace 
     0    1    2
0  1.0  2.0  3.0
1  4.0  0.0  0.0
2  0.0  0.0  9.0

Reference pandas.DataFrame.fillna

score 2 · Answer 9 · answered Dec 10 '22 at 02:31

There's also pandas.Interpolate, which I think gives one more control

import pandas as pd
df = pd.DataFrame([[1, 2, 3], [4, None, None], [None, None, 9]])

df=df.interpolate(method="pad",limit=None, downcast="infer") #downcast keeps dtype as int 
print(df)  

   0  1  2
0  1  2  3
1  4  2  3
2  4  2  9

score 1 · Answer 10 · answered Jul 19 '18 at 15:11

In my case, we have time series from different devices but some devices could not send any value during some period. So we should create NA values for every device and time period and after that do fillna.

df = pd.DataFrame([["device1", 1, 'first val of device1'], ["device2", 2, 'first val of device2'], ["device3", 3, 'first val of device3']])
df.pivot(index=1, columns=0, values=2).fillna(method='ffill').unstack().reset_index(name='value')

Result:

        0   1   value
0   device1     1   first val of device1
1   device1     2   first val of device1
2   device1     3   first val of device1
3   device2     1   None
4   device2     2   first val of device2
5   device2     3   first val of device2
6   device3     1   None
7   device3     2   None
8   device3     3   first val of device3

How to replace NaNs by preceding or next values in pandas DataFrame?

10 Answers10

Linked

Related