Conditional row drop in pandas affects previously defined dataframe

Question

came across an issue in python coding today. I was trying to conditionally drop some rows from a data frame. Basically, I wanted to create a copy of my data frame so I could see which rows had been dropped by comparing the copied dataframe with the dataframe that I dropped the rows from. Here is a simplified example in code:

import pandas as pd
# define simple data frame
d = {'col1': ['chris', 'ben'], 'col2': [3, 8]}
df=pd.DataFrame(data=d)
# define df2 to be identical to df AS CURRENTLY DEFINED
df2=df
# next, drop all rows in df if 'col1' = 'ben'
df.drop(df.loc[df['col1']=='ben'].index, inplace=True)

What I'm really puzzled by is that, when executed line by line, even though the last line of code only makes reference to df, it performs that operation on df2 as well!

Why is this? Is there something I'm completely missing?

score 0 · Accepted Answer · answered Sep 16 '20 at 04:40

0

use d2=df.copy() instead of d2=df to create a copy

answered Sep 16 '20 at 04:40

Conditional row drop in pandas affects previously defined dataframe

1 Answers1