3

I am new to python and am absolutely foxed by why the following happens -

  • I start with a dataframe df1
  • I make a copy of it and call it df2
  • I change a value in the copy (df2)
  • That changes the value in df1 also!

Here is a modified version of code I found in another question on stackoverflow (original question is here: Replace single value in a pandas dataframe, when index is not known and values in column are unique):

# Create a dataframe df1
df1 = pd.DataFrame([[5, 2], [3, 4]], columns=('a', 'b'))

#print df1
df1

    a   b
0   5   2
1   3   4

# copy it into df2
df2=df1

#print df2
df2

    a   b
0   5   2
1   3   4

# modify the value in df2 in column b where column a is 3
df2.loc[df2.a == 3, 'b'] = 6
    
# print df2 to check that the value has changed
df2

   a   b
0  5   2
1  3   6

# BUT changing df2 changed df1 also! Print df1
df1

   a   b
0  5   2
1  3   6

Can someone please explain this? Thanks

1 Answers1

3

Try the following code:

df2 = df1.copy()

What you've done is just referenced the object to a different name, while the underlying object is same, which is why changes in df2 were visible in df1.

r0ot293
  • 138
  • 9