0

I have seen questions about the copy() method not working for nested data columns, since modifying something on the copy also altered the original dataframe. However, all I could find was about renaming a nested field of the dataframe on this question.

Nonetheless, I am not renaming anything, I am altering a field of the nested column. So just wanted to confirm if that also does alters the original dataframe despite a copy was done. If that would be the case, then how can I make a copied dataframe that doesn't affects the original for nested columns?

For example in this code, I have a dataframe with a column of dictionaries. Each dictionary just has one field that is an array, it was expected it was all integers but some floats slipped in, so I want to convert them all to integers without altering the original dataframe.

However, if I apply a user defined function on a copied dataframe it affects the original as well

df=pd.DataFrame({'a':[{'field':[1,2,3.0]},{'field':[1,2,4.0]},{'field':[1,2,5.0]}]})
print('printing the original dataframe: \n', df['a'])
def integer_converter(x):
    x['a']['field']=[int(i) for i in x['a']['field']]
df2=df.copy(deep=True)
df2.apply(integer_converter,axis=1)
print('printing df2 after function: \n',df2['a'])

print('printing the original dataframe again: \n',df['a'])

The outputs were:

printing the original dataframe: 
 0    {'field': [1, 2, 3.0]}
1    {'field': [1, 2, 4.0]}
2    {'field': [1, 2, 5.0]}
Name: a, dtype: object
printing df2 after function: 
 0    {'field': [1, 2, 3]}
1    {'field': [1, 2, 4]}
2    {'field': [1, 2, 5]}
Name: a, dtype: object
printing the original dataframe again: 
 0    {'field': [1, 2, 3]}
1    {'field': [1, 2, 4]}
2    {'field': [1, 2, 5]}
Name: a, dtype: object
  • 2
    `df.copy()` does not perform a deep copy - your dicts are still the same. You can do `x['a'] = x['a'] | {'field': [int(i) for i in x['a']['field']]}` instead. – STerliakov Aug 25 '23 at 17:24
  • Sorry just edited the code in the question, and I added the parameter deep=True to the copy method, but it still modifies the original one. Do you mean that even with such parameter it doesn't not make a deep copy? Also what does this syntax does, or what name does it have so I can research it further? I've not seen it before `x['a']=x['a'] | ` – Eugenio.Gastelum96 Aug 25 '23 at 17:28
  • 1
    `x['a] = x['a] | {dictcomp}` is a simple dictionary update (not in-place, operator ref [here](https://peps.python.org/pep-0584/)) with a dictomp, not some weird syntactic construct. `a | b` is a union of dictionaries `a` and `b` (last wins). – STerliakov Aug 25 '23 at 19:42
  • With `object` dtype, each cell contains a reference (pointer) to a Python object. In this case that object is a `dict`. And the value for each key in the dict is also reference - to a list. – hpaulj Aug 25 '23 at 23:29

1 Answers1

1

Your example is copying an object contains python objects (dict). A deep copy will copy the data but will NOT do so recursively. Updating a nested data object will be reflected in the deep copy.

In other words, do.copy(deep=True) will deep copy the backend data matrix of your df but will not deep copy the element in the matrix (dict in your case).

If you really want to keep thing nested, you can actually refine your function by adding copy() to your dict, editing on copy, and then returning the edited copy. Accordingly, you will also need to overwrite the original df2 by using df2 = df2.apply(lambda x: ……, axis=1)

Please refer to the official example via: (scroll to the bottom) enter link description here