How to copy dataframe in pandas

Question

I have a python script which does the following:

final_df = main_df
final_df['new_column'] = final_df['b']*0.05

Basically I don't want to disturb main_df dataframe and work on a copy of it. But when run the above script, both final_df and main_df affect in similar ways. Why does this happen?

If I want to achieve the required way, how do I proceed?

``final_df = main_df.copy()`` that way main_df is not affected — sammywemmy, Sep 16 '21 at 04:46

score 2 · Answer 1 · answered Sep 16 '21 at 04:47

2

use copy method:

final_df = main_df.copy()

answered Sep 16 '21 at 04:47

Salek

449
1
10
19

score 1 · Answer 2 · answered Sep 16 '21 at 04:48

1

You have to deep copy it other wise it would be pointing to the same object. So it would be

final_df = main_df.copy() # by default deep copy is done

Here is the link to the documentation.

answered Sep 16 '21 at 04:48

user2736738

30,591
5
42
56

score 1 · Answer 3 · answered Sep 16 '21 at 05:01

The reason is when you do final_df = main_df, a copy of main_df is not made but just a new reference final_df is created thereby both final_df and main_df referring to the same memory location. So if one is changed like final_df in your case, the changes are also reflected into main_df as both are pointing to same memory location.
Example:

main_df = ['final', 'df']
final_df = main_df
print (f'Location of final_df: {id(final_df)}')
print (f'Location of main_df: {id(main_df)}')

Both the above print statements will print the same memory location.
Here is the nice writeup to understand this behavior.
If you don't want your main_df not to be affected, create a deepcopy of it as below:

final_df = main_df.copy()

with copy() a complete new copy is created and can be verified with below code:

from copy import copy
main_df = ['final', 'df']
final_df = main_df.copy()
print (f'Location of final_df: {id(final_df)}')
print (f'Location of main_df: {id(main_df)}')

Now both the print statements will print 2 different memory locations.

How to copy dataframe in pandas

3 Answers3