-2

I am doing a very simple transformation on a pandas dataframe by using a function, but I didn't expect the function will change the input dataframe (but it did). I was wondering why...

Here's my code:

x = pd.DataFrame({'a': [1,2,3], 'b': [3,4,5]})

def transform(df, increment):
    new_df = df
    new_df.a = new_df.a + increment
    return new_df

new_x = transform(x, 1)

new_x # output shows new_x.a is [2,3,4], which is expected.
x # output shows x.a is also [2,3,4]. I thought it should be [1,2,3]

Why is this the case? I think, in the function, all the operations are executed on the new_df, so the input x should stay exactly the same before and after I ran this transform function, isn't it?

user3768495
  • 4,077
  • 7
  • 32
  • 58

1 Answers1

1

This is because it does not create a copy, but another "reference" of the x object

x = pd.DataFrame({'a': [1,2,3], 'b': [3,4,5]})


def transform(df, increment):
    new_df = df.copy() # <--- piece to change
    new_df.a = new_df.a + increment
    return new_df

new_x = transform(x, 1)

new_x # output shows new_x.a is [2,3,4], which is expected.
x # output shows x.a is now [1,2,3].

When you add .copy() this will give you the expected behavior

MattR
  • 4,887
  • 9
  • 40
  • 67