1

While answering this question I came across a behaviour I do not understand.

I am trying to fillna specific columns val2 and val3 for rows which include the first instance of each value in id. For some reason an inplace solution with fillna doesn't appear to work, and I don't understand why.

Let's assume this input dataframe:

    id  val1  val2  val3       date
0  102     9   NaN   4.0 2002-01-01
1  102     2   3.0   NaN 2002-03-03
2  103     4   NaN   NaN 2003-04-04
3  103     7   4.0   5.0 2003-08-09
4  103     6   5.0   1.0 2005-02-03

Desired output, with a fill value of -1:

    id  val1  val2  val3       date
0  102     9  -1.0   4.0 2002-01-01
1  102     2   3.0   NaN 2002-03-03
2  103     4  -1.0  -1.0 2003-04-04
3  103     7   4.0   5.0 2003-08-09
4  103     6   5.0   1.0 2005-02-03

Below is a solution that works and the inplace variant that does not work:

mask = ~df['id'].duplicated()
val_cols = ['val2', 'val3']

df.loc[mask, val_cols] = df.loc[mask, val_cols].fillna(-1)  # WORKS
df.loc[mask, val_cols].fillna(-1, inplace=True)             # DOES NOT WORK

I am using Python 3.6.5, Pandas 0.23.0, NumPy 1.14.3.

Possibly this is intended behaviour, but I haven't been able to find a duplicate. As far as I can see, there's no chained indexing involved.

jpp
  • 159,742
  • 34
  • 281
  • 339
  • 2
    Check this man https://github.com/pandas-dev/pandas/issues/14858 – BENY Dec 15 '18 at 01:46
  • @W-B, Thank you. I find it confusing that `SettingwithCopyWarning` comes up so many times when it's not relevant. But here, when it is relevant, there's nothing. – jpp Dec 15 '18 at 01:48
  • As such, I also think we need to point to some documentation to know that `loc` always gives a copy. Maybe I missed something in the docs? – jpp Dec 15 '18 at 01:49
  • @jpp `loc` does not always give a copy sometimes it does return a view: `df.loc[:, 'val2'].fillna(-1, inplace=True)` `df.loc[:, column]` may return a view but not always: if it contains multiple columns as in `df.loc[:, df.columns[2:4]]`. It seems that if `loc` contains multiple columns then it returns a copy where one column returns a view – It_is_Chris Dec 15 '18 at 02:18
  • See [unutbu's answer](https://stackoverflow.com/questions/27367442/pandas-dataframe-view-vs-copy-how-do-i-tell) here along with the comments. `loc` with `__setitem__` works on the view. `loc` with `__getitem__` returns a copy most of the time. – ayhan Dec 15 '18 at 02:19
  • 1
    @ayhan, Thank you. Basically there are rules of thumbs, but whether or not you have a copy or a view is an implementation detail. I'm going to close this as a duplicate. But if anyone wants to construct a better answer feel free to reopen. – jpp Dec 15 '18 at 02:22

0 Answers0