Pandas dataframe fillna() only some columns in place

Question

I am trying to fill none values in a Pandas dataframe with 0's for only some subset of columns.

When I do:

import pandas as pd
df = pd.DataFrame(data={'a':[1,2,3,None],'b':[4,5,None,6],'c':[None,None,7,8]})
print df
df.fillna(value=0, inplace=True)
print df

The output:

     a    b    c
0  1.0  4.0  NaN
1  2.0  5.0  NaN
2  3.0  NaN  7.0
3  NaN  6.0  8.0
     a    b    c
0  1.0  4.0  0.0
1  2.0  5.0  0.0
2  3.0  0.0  7.0
3  0.0  6.0  8.0

It replaces every None with 0's. What I want to do is, only replace Nones in columns a and b, but not c.

What is the best way of doing this?

root · Accepted Answer · 2016-06-30T22:19:27.140

400

You can select your desired columns and do it by assignment:

df[['a', 'b']] = df[['a','b']].fillna(value=0)

The resulting output is as expected:

     a    b    c
0  1.0  4.0  NaN
1  2.0  5.0  NaN
2  3.0  0.0  7.0
3  0.0  6.0  8.0

edited Jun 30 '16 at 22:19

answered Jun 30 '16 at 22:09

root

32,715
6
74
87

3

Yes, this is exactly what I want! Thank you. Any ways to do this inplace? My original dataframe is pretty big. – Sait Jun 30 '16 at 22:10
2

I don't think there is any performance gain by doing this in place as you're overwriting the orig df anyway – EdChum Jun 30 '16 at 22:12
7

The loc is superfluous here, `df[['a', 'b']] = df[['a','b']].fillna(value=0)` will still work – EdChum Jun 30 '16 at 22:13
3

@EdChum Doesn't it produce a temporary data frame and hence need more memory to do so? (I am concerned more about memory than time complexity.) – Sait Jun 30 '16 at 22:14
but you have to produce a temp df at some point in the process in case it borks part way through so there is really no performance difference here between assigning back to yourself and using `inplace=True` – EdChum Jun 30 '16 at 22:16
9

For many operations, `inplace` will still work on a copy. I don't know if it's the case for `fillna` or not. See [this answer](http://stackoverflow.com/a/22533110/3339965) from one of the pandas core developers. – root Jun 30 '16 at 22:16
@EdChum Thanks, I've removed the `loc`. I've conditioned myself to always use `loc` just to play it safe! – root Jun 30 '16 at 22:20
No problem. I actually was thinking about the same thing you were earlier today in regards to `inplace`, and happened to find the link. That's some nice coincidental timing! – root Jun 30 '16 at 22:23
This just returns the two columns, but not `c`, which is an issue if you're chaining. For example: `df[['a', 'b']].fillna('').groupby(['a', 'b'])`, the `fillna` ensures that otherwise skipped `NaN`s are also included. If you don't chain, I suggest using `df[['a', 'b']].fillna('', inplace=True)` anyway. Alternatively: `df.fillna({'a':'','b':''}).groupby(['a', 'b'])` – Herbert Jun 03 '22 at 08:14

score 196 · Answer 2 · answered Nov 15 '17 at 18:59

196

You can using dict , fillna with different value for different column

df.fillna({'a':0,'b':0})
Out[829]: 
     a    b    c
0  1.0  4.0  NaN
1  2.0  5.0  NaN
2  3.0  0.0  7.0
3  0.0  6.0  8.0

After assign it back

df=df.fillna({'a':0,'b':0})
df
Out[831]: 
     a    b    c
0  1.0  4.0  NaN
1  2.0  5.0  NaN
2  3.0  0.0  7.0
3  0.0  6.0  8.0

answered Nov 15 '17 at 18:59

BENY

317,841
20
164
234

2

really cool, Btw for the dict you can use `fromkeys` if you want, +1 – U13-Forward Aug 29 '18 at 01:14
1

The answer/example would be clearer if it actually showed different values for the different columns. – RufusVS Sep 21 '18 at 18:00
@RufusVS that is right , but still try to match the op's expected output – BENY Sep 21 '18 at 18:04
4

This is the better solution that the accepted answer, because it avoids chained indexing issues, e.g. if used with `df.fillna({'a':0,'b':0}, inplace=True)` – Alex Apr 06 '20 at 10:39
3

How to use methods like `ffill` or `bfill` inside a dictionary? – shaik moeed Dec 28 '21 at 18:44

score 52 · Answer 3 · answered Jun 10 '18 at 02:22

52

You can avoid making a copy of the object using Wen's solution and inplace=True:

df.fillna({'a':0, 'b':0}, inplace=True)
print(df)

Which yields:

     a    b    c
0  1.0  4.0  NaN
1  2.0  5.0  NaN
2  3.0  0.0  7.0
3  0.0  6.0  8.0

answered Jun 10 '18 at 02:22

Leesa H.

521
4
3

6

While this is correct, avoiding a copy [isn't necessarily better](https://stackoverflow.com/a/22533110/9209546). – jpp Nov 16 '18 at 15:25

score 19 · Answer 4 · answered Dec 03 '18 at 20:49

19

using the top answer produces a warning about making changes to a copy of a df slice. Assuming that you have other columns, a better way to do this is to pass a dictionary:
df.fillna({'A': 'NA', 'B': 'NA'}, inplace=True)

answered Dec 03 '18 at 20:49

Jonathan

781
8
20

score 11 · Answer 5 · answered Jun 15 '21 at 15:53

11

This should work and without copywarning

df[['a', 'b']] = df.loc[:,['a', 'b']].fillna(value=0)

answered Jun 15 '21 at 15:53

Joshua Z

111
1
4

Josephine M. Ho · Answer 6 · 2018-09-17T21:52:16.277

9

Here's how you can do it all in one line:

df[['a', 'b']].fillna(value=0, inplace=True)

Breakdown: df[['a', 'b']] selects the columns you want to fill NaN values for, value=0 tells it to fill NaNs with zero, and inplace=True will make the changes permanent, without having to make a copy of the object.

edited Sep 17 '18 at 21:52

answered Sep 14 '18 at 22:26

Josephine M. Ho

543
1
6
8

6

Somehow this gives SettingWithCopyWarning and the change is not reflected in `df`. – Michael Nov 04 '20 at 15:18
Is there a way to fillna(0) on every column to the right of the nth column? – Arthur D. Howland Jan 06 '23 at 14:52

score 5 · Answer 7 · answered Aug 29 '18 at 01:26

5

Or something like:

df.loc[df['a'].isnull(),'a']=0
df.loc[df['b'].isnull(),'b']=0

and if there is more:

for i in your_list:
    df.loc[df[i].isnull(),i]=0

answered Aug 29 '18 at 01:26

U13-Forward

69,221
14
89
114

score 5 · Answer 8 · answered Dec 04 '20 at 06:48

5

For some odd reason this DID NOT work (using Pandas: '0.25.1')

df[['col1', 'col2']].fillna(value=0, inplace=True)

Another solution:

subset_cols = ['col1','col2']
[df[col].fillna(0, inplace=True) for col in subset_cols]

Example:

df = pd.DataFrame(data={'col1':[1,2,np.nan,], 'col2':[1,np.nan,3], 'col3':[np.nan,2,3]})

output:

   col1  col2  col3
0  1.00  1.00   nan
1  2.00   nan  2.00
2   nan  3.00  3.00

Apply list comp. to fillna values:

subset_cols = ['col1','col2']
[df[col].fillna(0, inplace=True) for col in subset_cols]

Output:

   col1  col2  col3
0  1.00  1.00   nan
1  2.00  0.00  2.00
2  0.00  3.00  3.00

answered Dec 04 '20 at 06:48

Amir F

2,431
18
12

I think `inplace` is not good practice, check [this](https://www.dataschool.io/future-of-pandas/#inplace) and [this](https://github.com/pandas-dev/pandas/issues/16529) – jezrael Dec 04 '20 at 06:55
So the best should be if raise `inplace` warning and then removed from pandas in my opinion. – jezrael Dec 04 '20 at 06:56
So easy advice - always avoid inplace and never such problem here ;) – jezrael Dec 04 '20 at 06:57

score 0 · Answer 9 · edited May 22 '20 at 20:30

0

Sometimes this syntax wont work:

df[['col1','col2']] = df[['col1','col2']].fillna()

Use the following instead:

df['col1','col2']

edited May 22 '20 at 20:30

Jaroslav Bezděk

6,967
6
29
46

answered Feb 24 '20 at 10:06

Sarath Baby

51
1
4

score 0 · Answer 10 · answered May 17 '23 at 08:06

0

If you're looking for a more efficient way,

for col in ['a', 'b']:
    v = df.loc[:, col].values
    np.nan_to_num(v, 0.0)

answered May 17 '23 at 08:06

Nimrod

2,908
9
20

Pandas dataframe fillna() only some columns in place

10 Answers10

Linked

Related