The reason is when you do final_df = main_df
, a copy of main_df
is not made but just a new reference final_df
is created thereby both final_df
and main_df
referring to the same memory location. So if one is changed like final_df
in your case, the changes are also reflected into main_df
as both are pointing to same memory location.
Example:
main_df = ['final', 'df']
final_df = main_df
print (f'Location of final_df: {id(final_df)}')
print (f'Location of main_df: {id(main_df)}')
Both the above print
statements will print the same memory location.
Here is the nice writeup to understand this behavior.
If you don't want your main_df
not to be affected, create a deepcopy of it as below:
final_df = main_df.copy()
with copy()
a complete new copy is created and can be verified with below code:
from copy import copy
main_df = ['final', 'df']
final_df = main_df.copy()
print (f'Location of final_df: {id(final_df)}')
print (f'Location of main_df: {id(main_df)}')
Now both the print
statements will print 2 different memory locations.