2

I don't understand why this renaming operation affects the original dataframe when the copy command is used. Why is df_copy a view of df and not really a copy? I would expect the print statement to output 'x' not 'y'.

df = pandas.DataFrame({'x': [0, 1]})
df_copy = df.copy(deep=True)
df_copy.columns.values[0] = 'y'
print(df.columns)
bbiegel
  • 207
  • 2
  • 8
  • It seems [there is bug](https://stackoverflow.com/questions/43291781/after-rename-column-get-keyerror) – jezrael May 28 '18 at 09:07
  • @jezrael I don't seem to experience that bug with `0.20.3` – zipa May 28 '18 at 09:15
  • @zipa - I test it in `pandas 0.23.0` and same problem. – jezrael May 28 '18 at 09:16
  • @jezrael Here I found the [very similar if not the same](https://stackoverflow.com/questions/17591104/in-pandas-can-i-deeply-copy-a-dataframe-including-its-index-and-column/17591423#17591423) question. It's from 2013 :) – zipa May 28 '18 at 09:21

1 Answers1

1

From docs:

Note that when copying an object containing Python objects, a deep copy will copy the data, but will not do so recursively. Updating a nested data object will be reflected in the deep copy.

It seems that this holds true for columns, when you are accessing them using indices (as you've proven).

When you reassign the columns, behavior is as expected:

df_copy.columns = ['y']
print(df.columns)
#Index([u'x'], dtype='object')
zipa
  • 27,316
  • 6
  • 40
  • 58