0

I have a dataframe df. Now, I took a backup of this df using:

df_backup = df

Later on in my code, I deleted few records from the original df using:

df.drop(df.index[indexes], inplace = True)

these rows gets deleted from the backup as well. It looks like df_backup is just a copy of df. How do I decouple both?

If I change anything to df, it shouldn't affect df_backup.

ayhan
  • 70,170
  • 20
  • 182
  • 203

1 Answers1

1

you can decouple them by making an actual copy (a copy is a separate object)

df_backup = df.copy()

as Anthony Sottile pointed out, you were creating another reference to your original dataframe rather than creating a new object. Which means you could change either your df or df_backup and both would show that change. He also suggested a good link to help understand this

Alter
  • 3,332
  • 4
  • 31
  • 56
  • 2
    might be worth adding that `df_backup = df` does not create a copy (since python is a referential language) and just makes two references pointing to the same object. OP seems confused about this :) perhaps https://stackoverflow.com/questions/986006/how-do-i-pass-a-variable-by-reference might clear some stuff up too – anthony sottile Jul 05 '17 at 00:29