tl;dr: Operations on a dataframe create using .copy() do not affect the original dataframe when its elements are of a certain type, but they do when the elements are list.
Cases where Pandas behaves as expected
Creating the new dataframe by assignment
I have understood that under some conditions, creating a new variable foo
using an existing pandas dataframe bar
can mean that operations on foo
creates changes in bar
as well. Hence:
foo = pd.DataFrame({'A':['one','two','three'],'B':['apple','banana','grape'],'C':['cat','zebra','donkey']})
bar = foo
bar['A'] = ['four','five','six']
will give foo['A'] = ['four','five','six']
.
Using .copy() and performing operations on columns
Fair enough. We spend no more than a minute searching to find that what you really need to do is:
foo = pd.DataFrame({'A':['one','two','three'],'B':['apple','banana','grape'],'C':['cat','zebra','donkey']})
bar = foo.copy()
bar['A'] = ['four','five','six']
and faithfully enough, this gives foo['A'] = ['one','two','three']
.
Using .copy() and performing operations on single elements
Likewise, performing an operation on just one element in foo
leaves bar
unaffected as desired:
foo = pd.DataFrame({'A':['one','two','three'],'B':['apple','banana','grape'],'C':['cat','zebra','donkey']})
bar = foo.copy()
bar['A'][0] = 'four'
which as expected/desired gives foo['A']=['one','two','three']
.
Where things go wrong: Using .copy() and forming operations on single elements that are lists
Here comes the twist:
foo = pd.DataFrame({'A':[['one','two','three'],['four','five','six'],['seven','eight','nine']]})
bar = foo.copy()
bar['A'][0][0] = 'apple'
gives foo['A'][0] = ['apple','two','three']
.
Why does this happen? What is the correct way to make a copy of the original dataframe if I don't want operations on the new dataframe to affect the original dataframe when the elements of the dataframe are lists?