How can I copy a pandas dataframe that contains a list, so that future changes to the list in the copy will not alter the original one

Question

a = pd.DataFrame({'a': [[1,1],[2, 2],[3,3]], 'b':[3,2,1]})

       a    b
0   [1, 1]  3
1   [2, 2]  2
2   [3, 3]  1

After making a deep copy of a

b = copy.deepcopy(a)

Changing the first one list elements in b

b.a[0][0] = 0   

       a    b
0   [0, 1]  3
1   [2, 2]  2
2   [3, 3]  1

also changes the list in a

a

       a    b
0   [0, 1]  3
1   [2, 2]  2
2   [3, 3]  1

Both df.copy and copy.deepcopy gives the same result. How can I avoid this 'chain effect' in copy? Or is there a way to make a 'deeper' copy of dataframe

Try a.copy() which should work. Check this: https://stackoverflow.com/questions/27673231/why-should-i-make-a-copy-of-a-data-frame-in-pandas?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa — Attersson, May 11 '18 at 07:50
@Attersson That sounds logical, but from [the docs](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.copy.html): "Note that when deep=True data is copied, actual python objects will not be copied recursively, only the reference to the object. This is in contrast to copy.deepcopy in the Standard Library, which recursively copies object data." — Ami Tavory, May 11 '18 at 08:08

Sruthi · Accepted Answer · 2018-05-11T12:22:46.513

So I found a workaround :

First made b a new dataframe and copied the column b from dataframe a. Then created an empty column a.

b=pd.DataFrame()
b["b"]=a["b"]
b["a"]=None

Now we can iterate over the rows of b and copy the column a's values from dataframe a. Note that we set the type to object so that we can assign lists to entries of the dataframe, which otherwise will throw ValueError

b =b.astype(object) 
for i in range(len(b)):
    b.loc[i,"a"]=copy.deepcopy(a.loc[i,"a"])

Then make your required changes :

b.a[0][0] = 0

Now we have b :

   b       a
0  3  [0, 1]
1  2  [2, 2]
2  1  [3, 3]

And a remained unchanged

        a  b
0  [1, 1]  3
1  [2, 2]  2
2  [3, 3]  1

Now let's see what might have been the issue with using copy. I used the id function after applying copy and the results were:

b=copy.copy(a)
print(id(a)==id(b))                        #False
print(id(a["a"])==id(b["a"]))              #False
print(id(a.loc[0,"a"])==id(b.loc[0,"a"]))  #True

Since the column a has lists in it, both the dataframes refer to the same list and modifying one affected the other. This was the intuition behind iteratively copying the lists one by one. Note that the same results were seen even on using b=copy.deepcopy(a)

Azuuu · Answer 2 · 2020-09-17T03:03:00.693

0

I asked this question 2 years ago. After couple of years experiencing in pandas and data wrangling, I find it is always the best practice to not use mutable data structures in a pandas data-frame. So, if I were there to do this again, I would have used a Tuple instead of a List, such that a mutation is not applicable. Hence the 'copy' issue in the question is not a thing.

edited Sep 17 '20 at 03:03

answered Sep 07 '20 at 03:14

Azuuu

853
8
17

How can I copy a pandas dataframe that contains a list, so that future changes to the list in the copy will not alter the original one

2 Answers2