While answering this question I came across a behaviour I do not understand.
I am trying to fillna
specific columns val2
and val3
for rows which include the first instance of each value in id
. For some reason an inplace
solution with fillna
doesn't appear to work, and I don't understand why.
Let's assume this input dataframe:
id val1 val2 val3 date
0 102 9 NaN 4.0 2002-01-01
1 102 2 3.0 NaN 2002-03-03
2 103 4 NaN NaN 2003-04-04
3 103 7 4.0 5.0 2003-08-09
4 103 6 5.0 1.0 2005-02-03
Desired output, with a fill value of -1
:
id val1 val2 val3 date
0 102 9 -1.0 4.0 2002-01-01
1 102 2 3.0 NaN 2002-03-03
2 103 4 -1.0 -1.0 2003-04-04
3 103 7 4.0 5.0 2003-08-09
4 103 6 5.0 1.0 2005-02-03
Below is a solution that works and the inplace
variant that does not work:
mask = ~df['id'].duplicated()
val_cols = ['val2', 'val3']
df.loc[mask, val_cols] = df.loc[mask, val_cols].fillna(-1) # WORKS
df.loc[mask, val_cols].fillna(-1, inplace=True) # DOES NOT WORK
I am using Python 3.6.5, Pandas 0.23.0, NumPy 1.14.3.
Possibly this is intended behaviour, but I haven't been able to find a duplicate. As far as I can see, there's no chained indexing involved.