I experienced some quite unexpected behavior when using the clip function of pandas.
So, here is a reproducible example:
import pandas as pd
df_init = pd.DataFrame({'W': [-1.00, 0.0, 0.0, 0.3, 0.5, 1.0]})
df_init['W_EX'] = df_init['W'] + 0.1
df_init['W_EX'].clip(upper=1.0, inplace=True)
df_init.loc[df_init['W']==-1.0, 'W_EX'] = -1.0
df_init
The output is, as one would expect:
Out[2]:
W W_EX
0 -1.0 -1.0
1 0.0 0.1
2 0.0 0.1
3 0.3 0.4
4 0.5 0.6
5 1.0 1.0
However, when I inspect a specific value:
df_init.loc[df_init['W']==-1.0, 'W_EX']
I see the following output:
Out[3]:
0 -0.9
Name: W_EX, dtype: float64
Although I used .loc
to overwrite the first value on the column, and although when printing the data frame I can see the new value, when I use .loc
with a row slice, I see the value, which I had before using .clip
.
Now it gets more complicated. If I inspect the series on the new column, I can see the value has been indeed updated:
df_init.loc[df_init['W']==-1.0, ['W_EX']]
Out[4]:
W_EX
0 -1.0
And lastly, like in Schrödinger's cat experiment, if I now go back and look a the column values, after having inspected the column series, I can now actually see, that the value's indeed the one I would have expected in the first place (as in Out[3]:
):
df_init.loc[df_init['W']==-1.0, 'W_EX']
Out[5]:
0 -1.0
Name: W_EX, dtype: float64
If I skip the .clip
call, all is fine. Would someone more knowledgeable than myself, please explain me what is going here?