1

I experienced some quite unexpected behavior when using the clip function of pandas.

So, here is a reproducible example:

import pandas as pd
df_init = pd.DataFrame({'W': [-1.00, 0.0, 0.0, 0.3, 0.5, 1.0]})
df_init['W_EX'] = df_init['W'] + 0.1
df_init['W_EX'].clip(upper=1.0, inplace=True)
df_init.loc[df_init['W']==-1.0, 'W_EX'] = -1.0
df_init

The output is, as one would expect:

Out[2]: 
     W  W_EX
0 -1.0  -1.0
1  0.0   0.1
2  0.0   0.1
3  0.3   0.4
4  0.5   0.6
5  1.0   1.0

However, when I inspect a specific value:

df_init.loc[df_init['W']==-1.0, 'W_EX']

I see the following output:

Out[3]: 
0   -0.9
Name: W_EX, dtype: float64

Although I used .loc to overwrite the first value on the column, and although when printing the data frame I can see the new value, when I use .loc with a row slice, I see the value, which I had before using .clip.

Now it gets more complicated. If I inspect the series on the new column, I can see the value has been indeed updated:

df_init.loc[df_init['W']==-1.0, ['W_EX']]

Out[4]: 
   W_EX
0  -1.0

And lastly, like in Schrödinger's cat experiment, if I now go back and look a the column values, after having inspected the column series, I can now actually see, that the value's indeed the one I would have expected in the first place (as in Out[3]:):

df_init.loc[df_init['W']==-1.0, 'W_EX']

Out[5]: 
0   -1.0
Name: W_EX, dtype: float64

If I skip the .clip call, all is fine. Would someone more knowledgeable than myself, please explain me what is going here?

Yannis P.
  • 2,745
  • 1
  • 24
  • 39
yanko
  • 23
  • 4
  • So after a bit of digging, I managed to narrow it down to this - it has something to do with doing the .clip operation inplace. If I do this, none of the unexpected behavior occurs: – yanko Aug 08 '22 at 10:29
  • There's some weirdsness going on with `inplace=True`. It looks like it's not performing assignment the way you expect. It may be better to default to explicit assignment so it's clearer what's happening. `df_init['W_EX'] = df_init['W_EX'].clip(upper=1.0)` – Joe Carboni Aug 08 '22 at 18:59

1 Answers1

2

It may be better to default to explicit assignment so that it's clearer what's happening. inplace=True performed on this slice of the dataframe doesn't appear to be assigning as expected, consistently.

There's some debate on whether the flag should stick around at all. In pandas, is inplace = True considered harmful, or not?

df_init['W_EX'] = df_init['W_EX'].clip(upper=1.0)
Joe Carboni
  • 421
  • 1
  • 6
  • Hey Joe, thank you very much for the pointer to the discussion. Honestly, I have already come around to accepting, that `inplace=True` is harmful when deployed on slices (rows or columns alike). I still find it useful for stuff like `df.columns.rename({...}, inplace=True)` though. One thing I can't wrap my head around is this - why do the cell values on the data frame change after merely observing the column series with `df_init.loc[df_init['W']==-1.0, ['W_EX']]` ? – yanko Aug 08 '22 at 20:59