23

I'm trying to fill NAs with "" on 4 specific columns in a data frame that are string/object types. I can assign these columns to a new variable as I fillna(), but when I fillna() inplace the underlying data doesn't change.

a_n6 = a_n6[["PROV LAST", "PROV FIRST", "PROV MID", "SPEC NM"]].fillna("")
a_n6

gives me:

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1542 entries, 0 to 3611
Data columns (total 4 columns):
PROV LAST     1542  non-null values
PROV FIRST    1542  non-null values
PROV MID      1542  non-null values
SPEC NM       1542  non-null values
dtypes: object(4)

but

a_n6[["PROV LAST", "PROV FIRST", "PROV MID", "SPEC NM"]].fillna("", inplace=True)
a_n6

gives me:

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1542 entries, 0 to 3611
Data columns (total 7 columns):
NPI           1103  non-null values
PIN           1542  non-null values
PROV FIRST    1541  non-null values
PROV LAST     1542  non-null values
PROV MID      1316  non-null values
SPEC NM       1541  non-null values
flag          439  non-null values
dtypes: float64(2), int64(1), object(4)

It's just one row, but still frustrating. What am I doing wrong?

C8H10N4O2
  • 18,312
  • 8
  • 98
  • 134
Beau Bristow
  • 231
  • 1
  • 2
  • 3
  • 4
    I've also come across some functions where `inplace=True` seems to be ignored. While that's not the issue in your case, it's worth keeping in mind when troubleshooting. – Zero Sep 12 '14 at 03:00

5 Answers5

36

Use a dict as the value argument to fillna()

As mentioned in the comment by @rhkarls on @Jeff's answer, using .loc indexed to a list of columns won't support inplace operations, which I too find frustrating. Here's a workaround.

Example:

import pandas as pd
import numpy as np

df = pd.DataFrame({'a':[1,2,3,4,np.nan],
                   'b':[6,7,8,np.nan,np.nan],
                   'x':[11,12,13,np.nan,np.nan],
                   'y':[16,np.nan,np.nan,19,np.nan]})
print(df)
#     a    b     x     y
#0  1.0  6.0  11.0  16.0
#1  2.0  7.0  12.0   NaN
#2  3.0  8.0  13.0   NaN
#3  4.0  NaN   NaN  19.0
#4  NaN  NaN   NaN   NaN

Let's say we want to fillna for x and y only, not a and b.

I would expect using .loc to work (as in an assignment), but it doesn't, as mentioned earlier:

# doesn't work
df.loc[:,['x','y']].fillna(0, inplace=True)
print(df) # nothing changed

However, the documentation says that the value argument to fillna() can be:

alternately a dict/Series/DataFrame of values specifying which value to use for each index (for a Series) or column (for a DataFrame). (values not in the dict/Series/DataFrame will not be filled).

It turns out that using a dict of values will work:

# works
df.fillna({'x':0, 'y':0}, inplace=True)
print(df)
#     a    b     x     y
#0  1.0  6.0  11.0  16.0
#1  2.0  7.0  12.0   0.0
#2  3.0  8.0  13.0   0.0
#3  4.0  NaN   0.0  19.0
#4  NaN  NaN   0.0   0.0

Also, if you have a lot of columns in your subset, you could use a dict comprehension, as in:

df.fillna(dict.fromkeys(['x', 'y'], 0), inplace=True) # also works
Eric O. Lebigot
  • 91,433
  • 48
  • 218
  • 260
C8H10N4O2
  • 18,312
  • 8
  • 98
  • 134
11

you are filling a copy (which you then can't see)

either:

  • don't fillna inplace (there is no performance gain from doing something inplace)

for example

a_n6[["PROV LAST", "PROV FIRST", "PROV MID", "SPEC NM"]] = a_n6[["PROV LAST", "PROV FIRST", "PROV MID", "SPEC NM"]].fillna("")

or preferably

a_n6.fillna({'PROV LAST': '', 'PROV FIRST': '',
            'PROV MID': '', 'SPEC NM': ''}, inplace=True)

here's a more in-depth explanation Pandas: Chained assignments

Areza
  • 5,623
  • 7
  • 48
  • 79
Jeff
  • 125,376
  • 21
  • 220
  • 187
  • 9
    I thought the inplace argument was supposed to prevent it from filling a copy. What is the point of the `inplace` arg if it doesn't change the behavior of the function? – wordsforthewise Sep 04 '16 at 22:34
  • 4
    So why is inplace even allowed for fillna()? – codingknob Sep 24 '16 at 01:41
  • 3
    Inplace will work if you use .loc. Inplace should not work if you are working on a copy. See the links that Jeff included. It will not work for a list of fields (e.g. df.loc[:,[list of fields]]), but it will work on a slice or single field. Also see https://github.com/pandas-dev/pandas/issues/11984 for some details on this. – rhkarls Nov 30 '16 at 12:11
  • I just ran fillna on a 20Gb dataset, with inplace and got 'not implemented' error. I don't see the rant about 'no performance gain'. I don't have another 20Gb memory to have a temp copy. Inplace would be highly beneficial. – Cowboy Trader Dec 03 '18 at 16:06
1

a workaround is to save fillna results in another variable and assign it back like this:

na_values_filled = X.fillna(0)
X = na_values_filled

My exact example (which I couldn't get to work otherwise) was a case where I wanted to fillna in only the first line of every group. Like this:

groups = one_train.groupby("installation_id")
first_indexes_filled = one_train.loc[groups.apply(pd.DataFrame.first_valid_index), 'clicks'].fillna(0)
one_train.loc[groups.apply(pd.DataFrame.first_valid_index), 'clicks'] =  first_indexes_filled

My case may be unnecessarily complicated but i think the general "save results, then assign back" method should work as a workaround for the failing inplace=True

user2677285
  • 313
  • 1
  • 7
  • I had to do the same thing - I was using bfill and ffill conditionally, and the introduction of the condition seemed to prevent fillna from working. – James_SO Sep 20 '22 at 12:50
0

The top answer gave me SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame, so this is what I ended up with. It works and doesn't give any warnings:

fill_dict = {x: 0 for x in columns_of_interest}
df.loc[:, columns_of_interest].fillna(fill_dict, inplace=True)
jss367
  • 4,759
  • 14
  • 54
  • 76
  • This did get rid of the `SettingWithCopyWarning`, but then the `inplace=True` did not work (`df` did not change). – wisbucky Dec 20 '22 at 02:17
0

The "Use a dict as the value argument" answer doesn't work for me, but an easy enough workaround is to use:

for n in ["PROV LAST", "PROV FIRST", "PROV MID", "SPEC NM"]:    
    a_n6[n].fillna("", inplace=True)
a_n6
Nicola
  • 81
  • 3