2

I use pandas replace function to replace a value. Please see the code below:

import pandas as pd

d = {'color' : pd.Series(['white', 'blue', 'orange']),
   'second_color': pd.Series(['white', 'black', 'blue']),
   'value' : pd.Series([1., 2., 3.])}
df1 = pd.DataFrame(d)
print(df1)

df = df1
df['color'] = df['color'].replace('white','red')

print(df1)

print(df)

I intend to change a value in df, but why is the same value in df1 also changed?

The code below is ok.

df=df.replace('white','red')
Tim Stack
  • 3,209
  • 3
  • 18
  • 39

2 Answers2

4

You need to use .copy()

df = df1.copy()

So the changes you do to df will not propagate to df1

Let's try
  • 1,044
  • 9
  • 20
2

Because both are referencing the same data location.

When you do df = df1 it does not create a new data frame it just set the reference of df to variable df1. Using id() you can see both referencing to the same address.

>>> df = df1
>>> id(df)
41633008
>>> id(df1)
41633008

To make a new copy you can use DataFrame.copy method

>>> df = df1.copy()
>>> id(df)
31533376
>>> id(df1)
41633008

Now you can see both referenced to different locations.

There is still much to learn about shallow copy and deep copy. Please read the document for more. - here

Dishin H Goyani
  • 7,195
  • 3
  • 26
  • 37