0

I thought I understood that an assignment of a second df works with a reference / pointer. So in the code below the first element changes in 'both' df's. But what happens with .dropna() was unexpected for me. It seems somehow this method creates a copy. Anyone know how this works and why? See the second code part. I expected the two df's to have changed.

df1 = pd.DataFrame([1, np.NaN, 2, 3, 5, np.NaN])
df2 = df1
df3 = df1

df2.iloc[0, 0] = 9
#Both changed
display(df1)
display(df2)

df3 = df3.dropna()

#Only df3 changed ??
display(df1)
display(df3)
MacMesser
  • 55
  • 4

1 Answers1

0

When you are doing

df2.iloc[0, 0] = 9

the operation is inplace. Also, till now, df2 and df3 are copy of df1 and any modification would impact df1.
But when you do

df3 = df3.dropna()

here df3 is assigned a view of dataframe returned and not a copy. So, as it is a view, there is no impact on df1. But if you would have done the following:

df3.dropna(inplace=True)

df3 would have been modified inplace and as it is a copy of df1, df1 would also have been changed in same way.
Refer What rules does Pandas use to generate a view vs a copy? for details.

ggaurav
  • 1,764
  • 1
  • 10
  • 10