I was explaining[1] in-place operations vs out-of-place operations to a new user of Pandas. This resulted in us discussing passing objects by reference of by value.
Naturally, I wanted to show pandas.DataFrame.values
as I thought it shared the memory location of the underlying data of the DataFrame. However, I was surprised with and then sidetracked by the results of the following code segment.
import pandas as pd
df = pd.DataFrame({'x': [1,2,3,4],
'y': [5,4,3,2]})
print(df)
df.values += 1 # raises AttributeError
x y
0 1 5
1 2 4
2 3 3
3 4 2
<ipython-input-126-9fa9f393972b>:8: UserWarning: Pandas doesn't allow columns to be created via a new attribute name - see https://pandas.pydata.org/pandas-docs/stable/indexing.html#attribute-access
df.values += 1
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/generic.py in __setattr__(self, name, value)
5169 else:
-> 5170 object.__setattr__(self, name, value)
5171 except (AttributeError, TypeError):
AttributeError: can't set attribute
During handling of the above exception, another exception occurred:
AttributeError Traceback (most recent call last)
<ipython-input-126-9fa9f393972b> in <module>
6 print(df)
7
----> 8 df.values += 1
~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/generic.py in __setattr__(self, name, value)
5178 stacklevel=2,
5179 )
-> 5180 object.__setattr__(self, name, value)
5181
5182 def _dir_additions(self):
AttributeError: can't set attribute
However, despite this error, if we re-examine the df, it has changed.
print(df)
x y
0 2 6
1 3 5
2 4 4
3 5 3
My attempt to explain this behavior.
First, we can write df.values += 1
as df.values = df.values.__iadd__(1)
That means the RHS of this expression evaluates properly resulting in the underlying data being changed. Then, re-assigning df.values
to a new value raises the exception.
If I break up these two operations, no error is raised and the underlying data is changed.
print(df)
values = df.values
values += 1
print(df)
x y
0 2 6
1 3 5
2 4 4
3 5 3
x y
0 3 7
1 4 6
2 5 5
3 6 4
Is this a bug?
Should .values
be treated differently than with __getattr__/__setattr__
?
Part of me wants to say this is not a bug as the user should read the documentation and use the recommend replacement pandas.DataFrame.to_numpy.
However, part of me says that it is pretty unintuitive to see a "AttributeError: can't set attribute" but have the underlying operation actually work. That being said, I can't think of a solution that allows these operations to work in the proper situations while still preventing improper use.
Does anyone have any insights into this?
[1]: Until I got derailed by this issue and [Insert Link] potential issue.