Fillna in multiple columns in place in Python Pandas

Question

I have a pandas dataFrame of mixed types, some are strings and some are numbers. I would like to replace the NAN values in string columns by '.', and the NAN values in float columns by 0.

Consider this small fictitious example:

df = pd.DataFrame({'Name':['Jack','Sue',pd.np.nan,'Bob','Alice','John'],
    'A': [1, 2.1, pd.np.nan, 4.7, 5.6, 6.8],
    'B': [.25, pd.np.nan, pd.np.nan, 4, 12.2, 14.4],
    'City':['Seattle','SF','LA','OC',pd.np.nan,pd.np.nan]})

Now, I can do it in 3 lines:

df['Name'].fillna('.',inplace=True)
df['City'].fillna('.',inplace=True)
df.fillna(0,inplace=True)

Since this is a small dataframe, 3 lines is probably ok. In my real example (which I cannot share here due to data confidentiality reasons), I have many more string columns and numeric columns. SO I end up writing many lines just for fillna. Is there a concise way of doing this?

In your real example, for the string columns are the `NaN` or the string `'NaN'`? — EdChum, Jan 21 '16 at 08:51

score 86 · Answer 1 · edited Apr 05 '22 at 12:48

86

Came across this page while looking for an answer to this problem, but didn't like the existing answers. I ended up finding something better in the DataFrame.fillna documentation, and figured I'd contribute for anyone else that happens upon this.

If you have multiple columns, but only want to replace the NaN in a subset of them, you can use:

df.fillna({'Name':'.', 'City':'.'}, inplace=True)

This also allows you to specify different replacements for each column. And if you want to go ahead and fill all remaining NaN values, you can just throw another fillna on the end:

df.fillna({'Name':'.', 'City':'.'}, inplace=True).fillna(0, inplace=True)

Edit (22 Apr 2021)

Functionality (presumably / apparently) changed since original post, and you can no longer chain 2 inplace fillna() operations. You can still chain, but now must assign that chain to the df instead of modifying in place, e.g. like so:

df = df.fillna({'Name':'.', 'City':'.'}).fillna(0)

edited Apr 05 '22 at 12:48

nutty about natty

1,267
2
10
17

answered Aug 02 '18 at 13:27

Rob Bulmahn

1,035
8
10

@Make42 Not a coding error; the intention was to replace the values in-place in the dataframe. – Rob Bulmahn Apr 21 '21 at 17:22
1

Please check your answer: If you use inplace, then the return value is `None`. Thus you will will fill `'Name'` and `'City'` with `'.'` in `df`. After that you return `None` and get an error trying to use your second `fillna` on the `None` value. If you now only remove the first `ìnplace=True` you do not get THAT error. However, now you create a data frame in memory (with no Python symbol pointing to it), which nan-values are replaced by 0. The second `fillna` return - again - `None`, which we cannot use. Now you do have the desired data frame somewhere in memory, but you cannot reference it. – Make42 Apr 21 '21 at 19:05
Ugh...it worked as written when it was written, but I guess they've gone and updated something that broke it. – Rob Bulmahn Apr 22 '21 at 20:25
To be honest, I have been using pandas for longer than 2018 and I cannot remember that this was different at any point than it is now, but I could be mistaken. – Make42 Apr 23 '21 at 16:39
`inplace` is the scourge anyway: https://github.com/pandas-dev/pandas/issues/16529. Also, afaik `inplace` has always returned None, that's kind of the point. You either use `inplace`, or you chain, but not both together. To use `inplace` in this scenario you'd need to break it into 2 lines. – fantabolous May 17 '23 at 04:53

Anton Protopopov · Accepted Answer · 2016-01-21T06:49:08.173

57

You could use apply for your columns with checking dtype whether it's numeric or not by checking dtype.kind:

res = df.apply(lambda x: x.fillna(0) if x.dtype.kind in 'biufc' else x.fillna('.'))

print(res)
     A      B     City   Name
0  1.0   0.25  Seattle   Jack
1  2.1   0.00       SF    Sue
2  0.0   0.00       LA      .
3  4.7   4.00       OC    Bob
4  5.6  12.20        .  Alice
5  6.8  14.40        .   John

edited Jan 21 '16 at 06:49

answered Jan 21 '16 at 06:21

Anton Protopopov

30,354
12
88
93

4

Very elegant! Can you confirm that `'biufc'` is for boolean, integer, unicode, float & complex data types? – Lenwood May 17 '18 at 21:12
1

@Lenwood yes, you could follow the link for [`dtype.kind`](http://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.dtype.kind.html) to check that. – Anton Protopopov May 18 '18 at 05:15
This is great. What if I wanted to fill in the mean of the column instead of "0"? – Peter May 27 '20 at 20:00
1

@Peter I guess you could replace `x.fillna(0)` -> `x.fillna(x.mean())` – Anton Protopopov May 28 '20 at 10:32

Bob Baxley · Answer 3 · 2020-11-29T21:03:12.097

26

You can either list the string columns by hand or glean them from df.dtypes. Once you have the list of string/object columns, you can call fillna on all those columns at once.

# str_cols = ['Name','City']
str_cols = df.columns[df.dtypes==object]
df[str_cols] = df[str_cols].fillna('.')
df = df.fillna(0)

edited Nov 29 '20 at 21:03

answered Jan 21 '16 at 01:17

Bob Baxley

3,551
1
22
28

score 3 · Answer 4 · answered Jan 21 '16 at 02:53

3

define a function:

def myfillna(series):
    if series.dtype is pd.np.dtype(float):
        return series.fillna(0)
    elif series.dtype is pd.np.dtype(object):
        return series.fillna('.')
    else:
        return series

you can add other elif statements if you want to fill a column of a different dtype in some other way. Now apply this function over all columns of the dataframe

df = df.apply(myfillna)

this is the same as 'inplace'

answered Jan 21 '16 at 02:53

latorrefabian

1,077
3
9
19

This makes sense to write as a function. But wouldn't we need to write it as `for col in df.columns: df[col]=df.apply(myfillna)` ? The function is returning a series, and we replace the whole dataframe with this. – ozzy Jan 21 '16 at 18:31
no because by default apply has the parameter axis=0 which means to apply the function over each column and then returns the result as a dataframe with the new columns. – latorrefabian Jan 21 '16 at 18:36
so actually df.apply(myfillna) is doing what you suggest behind the scenes. – latorrefabian Jan 21 '16 at 18:40
Ok, it makes sense.. And I have tried it, and it works! Thanks! – ozzy Jan 21 '16 at 18:48

Berel Levy · Answer 5 · 2022-05-17T20:33:06.980

The most concise and readable way to accomplish this, especially with many columns is to use df.select_dtypes.columns. (df.select_dtypes, df.columns)

df.select_dtypes returns a new df containing only the columns that match the dtype you need.

df.columns returns a list of the column names in your df.

Full code:

float_column_names = df.select_dtypes(float).columns
df[float_column_names] = df[float_column_names].fillna(0)

string_column_names = df.select_dtypes(object).columns
df[string_column_names] df[string_column_names].fillna('.')

score 0 · Answer 6 · answered Aug 08 '20 at 03:33

There is a simpler way, that can be done in one line:

df.fillna({'Name':0,'City':0},inplace=True)

Not an awesome improvement but if you multiply it by 100, writting only the column names + ':0' is way faster than copying and pasting everything 100 times.

score 0 · Answer 7 · answered Jan 04 '21 at 19:08

0

If you want to replace a list of columns ("lst") with the same value ("v")

def nan_to_zero(df, lst, v):
    d = {x:v for x in lst}
    df.fillna(d, inplace=True)
    return df

answered Jan 04 '21 at 19:08

Tom

1,003
2
13
25

Devyzr · Answer 8 · 2022-11-01T19:29:25.743

-1

If you don't want to specify individual per-column replacement values, you can do it this way:

df[['Name', 'City']].fillna('.',inplace=True)

If you don't like inplace (like me) you can do it like this:

columns = ['Name', 'City']
df[columns] = df.copy()[columns].fillna('.')

The .copy() is added to avoid the SettingWithCopyWarning, which is designed to warn you that the original values of a dataframe is overwritten, which is what we want.

If you don't like that syntax, you can see this question to see other ways of dealing with this: How to deal with SettingWithCopyWarning in Pandas

edited Nov 01 '22 at 19:29

answered Feb 03 '22 at 17:59

Devyzr

299
5
13

"A value is trying to be set on a copy of a slice from a DataFrame" – Ben Oct 06 '22 at 18:31
@Ben, updated the answer with a way to avoid the warning – Devyzr Nov 01 '22 at 19:30

score -2 · Answer 9 · answered Sep 17 '20 at 09:51

-2

Much easy way is :dt.replace(pd.np.nan, "NA"). In case you want other replacement, you should use the next:dt.replace("pattern", "replaced by (new pattern)")

answered Sep 17 '20 at 09:51

A. chahid

184
2
5

1

This is just a different way of writing the standard `df.fillna('NA', inplace=True)` so you're not really gaining anything, plus it's a less standard way of filling `NaN` values in pandas. – Rob Bulmahn Nov 24 '20 at 00:54
No, its more general than what you think. this "dt.replace("pattern", "replaced by (new pattern)")"is given the programmer more choice to use any pattern to replace. – A. chahid Nov 24 '20 at 21:12
This doesn't answer the question of replacing multiple specific columns at once and advises going against convention for a very standard practice in pandas. – Ben Oct 06 '22 at 18:34

Fillna in multiple columns in place in Python Pandas

9 Answers9

Linked