0

came across an issue in python coding today. I was trying to conditionally drop some rows from a data frame. Basically, I wanted to create a copy of my data frame so I could see which rows had been dropped by comparing the copied dataframe with the dataframe that I dropped the rows from. Here is a simplified example in code:

import pandas as pd
# define simple data frame
d = {'col1': ['chris', 'ben'], 'col2': [3, 8]}
df=pd.DataFrame(data=d)
# define df2 to be identical to df AS CURRENTLY DEFINED
df2=df
# next, drop all rows in df if 'col1' = 'ben'
df.drop(df.loc[df['col1']=='ben'].index, inplace=True)

What I'm really puzzled by is that, when executed line by line, even though the last line of code only makes reference to df, it performs that operation on df2 as well!

Why is this? Is there something I'm completely missing?

1 Answers1

0

use d2=df.copy() instead of d2=df to create a copy