61

I have a pandas dataFrame of mixed types, some are strings and some are numbers. I would like to replace the NAN values in string columns by '.', and the NAN values in float columns by 0.

Consider this small fictitious example:

df = pd.DataFrame({'Name':['Jack','Sue',pd.np.nan,'Bob','Alice','John'],
    'A': [1, 2.1, pd.np.nan, 4.7, 5.6, 6.8],
    'B': [.25, pd.np.nan, pd.np.nan, 4, 12.2, 14.4],
    'City':['Seattle','SF','LA','OC',pd.np.nan,pd.np.nan]})

Now, I can do it in 3 lines:

df['Name'].fillna('.',inplace=True)
df['City'].fillna('.',inplace=True)
df.fillna(0,inplace=True)

Since this is a small dataframe, 3 lines is probably ok. In my real example (which I cannot share here due to data confidentiality reasons), I have many more string columns and numeric columns. SO I end up writing many lines just for fillna. Is there a concise way of doing this?

Amir
  • 10,600
  • 9
  • 48
  • 75
ozzy
  • 643
  • 1
  • 5
  • 7

9 Answers9

86

Came across this page while looking for an answer to this problem, but didn't like the existing answers. I ended up finding something better in the DataFrame.fillna documentation, and figured I'd contribute for anyone else that happens upon this.

If you have multiple columns, but only want to replace the NaN in a subset of them, you can use:

df.fillna({'Name':'.', 'City':'.'}, inplace=True)

This also allows you to specify different replacements for each column. And if you want to go ahead and fill all remaining NaN values, you can just throw another fillna on the end:

df.fillna({'Name':'.', 'City':'.'}, inplace=True).fillna(0, inplace=True)

Edit (22 Apr 2021)

Functionality (presumably / apparently) changed since original post, and you can no longer chain 2 inplace fillna() operations. You can still chain, but now must assign that chain to the df instead of modifying in place, e.g. like so:

df = df.fillna({'Name':'.', 'City':'.'}).fillna(0)
nutty about natty
  • 1,267
  • 2
  • 10
  • 17
Rob Bulmahn
  • 1,035
  • 8
  • 10
  • @Make42 Not a coding error; the intention was to replace the values in-place in the dataframe. – Rob Bulmahn Apr 21 '21 at 17:22
  • 1
    Please check your answer: If you use inplace, then the return value is `None`. Thus you will will fill `'Name'` and `'City'` with `'.'` in `df`. After that you return `None` and get an error trying to use your second `fillna` on the `None` value. If you now only remove the first `ìnplace=True` you do not get THAT error. However, now you create a data frame in memory (with no Python symbol pointing to it), which nan-values are replaced by 0. The second `fillna` return - again - `None`, which we cannot use. Now you do have the desired data frame somewhere in memory, but you cannot reference it. – Make42 Apr 21 '21 at 19:05
  • Ugh...it worked as written when it was written, but I guess they've gone and updated something that broke it. – Rob Bulmahn Apr 22 '21 at 20:25
  • To be honest, I have been using pandas for longer than 2018 and I cannot remember that this was different at any point than it is now, but I could be mistaken. – Make42 Apr 23 '21 at 16:39
  • `inplace` is the scourge anyway: https://github.com/pandas-dev/pandas/issues/16529. Also, afaik `inplace` has always returned None, that's kind of the point. You either use `inplace`, or you chain, but not both together. To use `inplace` in this scenario you'd need to break it into 2 lines. – fantabolous May 17 '23 at 04:53
57

You could use apply for your columns with checking dtype whether it's numeric or not by checking dtype.kind:

res = df.apply(lambda x: x.fillna(0) if x.dtype.kind in 'biufc' else x.fillna('.'))

print(res)
     A      B     City   Name
0  1.0   0.25  Seattle   Jack
1  2.1   0.00       SF    Sue
2  0.0   0.00       LA      .
3  4.7   4.00       OC    Bob
4  5.6  12.20        .  Alice
5  6.8  14.40        .   John
Anton Protopopov
  • 30,354
  • 12
  • 88
  • 93
26

You can either list the string columns by hand or glean them from df.dtypes. Once you have the list of string/object columns, you can call fillna on all those columns at once.

# str_cols = ['Name','City']
str_cols = df.columns[df.dtypes==object]
df[str_cols] = df[str_cols].fillna('.')
df = df.fillna(0)
Bob Baxley
  • 3,551
  • 1
  • 22
  • 28
3

define a function:

def myfillna(series):
    if series.dtype is pd.np.dtype(float):
        return series.fillna(0)
    elif series.dtype is pd.np.dtype(object):
        return series.fillna('.')
    else:
        return series

you can add other elif statements if you want to fill a column of a different dtype in some other way. Now apply this function over all columns of the dataframe

df = df.apply(myfillna)

this is the same as 'inplace'

latorrefabian
  • 1,077
  • 3
  • 9
  • 19
  • This makes sense to write as a function. But wouldn't we need to write it as `for col in df.columns: df[col]=df.apply(myfillna)` ? The function is returning a series, and we replace the whole dataframe with this. – ozzy Jan 21 '16 at 18:31
  • no because by default apply has the parameter axis=0 which means to apply the function over each column and then returns the result as a dataframe with the new columns. – latorrefabian Jan 21 '16 at 18:36
  • so actually df.apply(myfillna) is doing what you suggest behind the scenes. – latorrefabian Jan 21 '16 at 18:40
  • Ok, it makes sense.. And I have tried it, and it works! Thanks! – ozzy Jan 21 '16 at 18:48
1

The most concise and readable way to accomplish this, especially with many columns is to use df.select_dtypes.columns. (df.select_dtypes, df.columns)

df.select_dtypes returns a new df containing only the columns that match the dtype you need.

df.columns returns a list of the column names in your df.

Full code:

float_column_names = df.select_dtypes(float).columns
df[float_column_names] = df[float_column_names].fillna(0)

string_column_names = df.select_dtypes(object).columns
df[string_column_names] df[string_column_names].fillna('.')
Berel Levy
  • 61
  • 4
0

There is a simpler way, that can be done in one line:

df.fillna({'Name':0,'City':0},inplace=True)

Not an awesome improvement but if you multiply it by 100, writting only the column names + ':0' is way faster than copying and pasting everything 100 times.

0

If you want to replace a list of columns ("lst") with the same value ("v")

def nan_to_zero(df, lst, v):
    d = {x:v for x in lst}
    df.fillna(d, inplace=True)
    return df
Tom
  • 1,003
  • 2
  • 13
  • 25
-1

If you don't want to specify individual per-column replacement values, you can do it this way:

df[['Name', 'City']].fillna('.',inplace=True)

If you don't like inplace (like me) you can do it like this:

columns = ['Name', 'City']
df[columns] = df.copy()[columns].fillna('.')

The .copy() is added to avoid the SettingWithCopyWarning, which is designed to warn you that the original values of a dataframe is overwritten, which is what we want.

If you don't like that syntax, you can see this question to see other ways of dealing with this: How to deal with SettingWithCopyWarning in Pandas

Devyzr
  • 299
  • 5
  • 13
-2

Much easy way is :dt.replace(pd.np.nan, "NA"). In case you want other replacement, you should use the next:dt.replace("pattern", "replaced by (new pattern)")

A. chahid
  • 184
  • 2
  • 5
  • 1
    This is just a different way of writing the standard `df.fillna('NA', inplace=True)` so you're not really gaining anything, plus it's a less standard way of filling `NaN` values in pandas. – Rob Bulmahn Nov 24 '20 at 00:54
  • No, its more general than what you think. this "dt.replace("pattern", "replaced by (new pattern)")"is given the programmer more choice to use any pattern to replace. – A. chahid Nov 24 '20 at 21:12
  • This doesn't answer the question of replacing multiple specific columns at once and advises going against convention for a very standard practice in pandas. – Ben Oct 06 '22 at 18:34