0

Consider this code:

In [16]: data = [['Alex',10],['Bob',12],['Clarke',13]]
In [17]: df = pd.DataFrame(data,columns=['Name','Age'])
Out[18]: 
     Name  Age
0    Alex   10
1     Bob   12
2  Clarke   13

In [19]: df_new = df
In [20]: df_new['Age'] = df_new['Age'] * 90 / 100

In [21]: df_new
     Name   Age
0    Alex   9.0
1     Bob  10.8
2  Clarke  11.7

In [22]: df
     Name   Age
0    Alex   9.0
1     Bob  10.8
2  Clarke  11.7

When I assigned new values to the Age columns of the new DataFrame (df_new), the Age column of the original DataFrame (df) changed as well.

Why does it happen? Does it have something to do with the way I create a copy of the original DataFrame? Seem like they are chained together.

foo
  • 157
  • 1
  • 1
  • 11

1 Answers1

1

Use -

df_new = df.copy()

OR

df_new = df.copy(deep=True)

This is the standard way of making a copy of a pandas object’s indices and data.

From the pandas documentation

When deep=True (default), a new object will be created with a copy of the calling object’s data and indices. Modifications to the data or indices of the copy will not be reflected in the original object

Explanation

If you see the object IDs of the various DataFrames you create, you can clearly see what is happening.

When you write df_new = df, you are creating a variable named new_df, and binding it with an object with same id as that of df.

Example

data = [['Alex',10],['Bob',12],['Clarke',13]]
df = pd.DataFrame(data,columns=['Name','Age'])

df_new = df
df_copy = df.copy()
print("ID of old df: {}".format(id(df)))
print("ID of new df: {}".format(id(df_new)))
print("ID of copy df: {}".format(id(df_copy)))

Output

ID of old df: 113414664
ID of new df: 113414664
ID of copy df: 113414832
Vivek Kalyanarangan
  • 8,951
  • 1
  • 23
  • 42