2

Following code is my simple example, for each step I explain what I'm doing in the comments, and the question is at the end.

import pandas as pd
todays_date = datetime.datetime.now().date()
index = pd.date_range(todays_date, periods=3, freq='D')
columns = ['A','B','C']
df1 = pd.DataFrame(index=index, columns=columns)
df1 = df1.fillna(1) 

# up to here, i've just create a random df1, which looks like the follow:  
#               A   B   C
# 2020-03-24    1   1   1
# 2020-03-25    1   1   1
# 2020-03-26    1   1   1

df2 = df1     # here created a copy of df1 named as df2, it should pass by value based on my knowledge 
df1 += df2.shift(1)    # this should be the same as df1 = df1 + df2.shift(1)

display(df1)  # here I print out df1 which looks like the follow: 
#               A   B   C
# 2020-03-24    NaN NaN NaN
# 2020-03-25    2.0 2.0 2.0
# 2020-03-26    2.0 2.0 2.0

display(df2)  # here I print out df2, the result surprise me because i thought df2 isn't changed from when it is defined , however it becomes the same as the new df1: 
#               A   B   C
# 2020-03-24    NaN NaN NaN
# 2020-03-25    2.0 2.0 2.0
# 2020-03-26    2.0 2.0 2.0

Can anyone explain to me why df2 is changed in these steps? I'm really confused.

mkrieger1
  • 19,194
  • 5
  • 54
  • 65
Jeremy
  • 379
  • 2
  • 11
  • 5
    https://nedbatchelder.com/text/names.html – Alex Hall Mar 24 '20 at 10:50
  • 1
    If you want df2 to be different from df1 then you need to make a copy of df1. `df2 = df1.copy() ` doing `df2 = df1` is just saying create a variable name called df2 and point it at what ever df1 points to. Thus if you change df1 then you will see that change in df2 since they both point to the same place. – Chris Doyle Mar 24 '20 at 10:52
  • [Shallow copy and deep copy](https://stackoverflow.com/questions/4794244/how-can-i-create-a-copy-of-an-object-in-python) – Craicerjack Mar 24 '20 at 10:52
  • wait a sec. if i do the following: a = 1 ; b = a ; a = 2 ; display(b) ; # "b" is still =1 right? even i change the value of "a" from 1 to 2 – Jeremy Mar 24 '20 at 10:54
  • @Jeremy integers are immutable objects. Try with a list: `a=[]; b=a; a.append(1); print(b)` – GPhilo Mar 24 '20 at 10:55
  • You are mistaking now the difference between mutable and immutable. Mutable objects can change the data they point to. Immutable objects cannot and they will create and point to a new object. – Chris Doyle Mar 24 '20 at 10:55
  • 1
    @GPhilo **no**. The **type** of the object **is completely irrelevant**. In that case, `a` is *merely getting re-assigned*. The *same exact behavior* happens with lists, `a = []; b = a; a = [42]; print(b)` – juanpa.arrivillaga Mar 24 '20 at 10:56
  • 1
    @ChrisDoyle no! mutability is **not relevant**. The semantics of assignment are exactly the same for all objects, regardless of type. – juanpa.arrivillaga Mar 24 '20 at 10:57
  • 1
    Yes but in the case of a=5; b=a; a+=5; its import for the OP to understand that ints are immutable and while symantically it looks like a is being "modified" its really not and a new object is created and a is pointed to it leaving b pointing to the old object – Chris Doyle Mar 24 '20 at 10:58
  • 1
    @ChrisDoyle that is merely because `int.__iadd__` is not a mutator method. Note, that comment used regular assignment anyway, not augmented assignment. All *immutable* means is "lacks any mutator methods in the public API", objects *are not treated differently* based on type when it comes to the fundamental language semantics. Immutable objects *simply lack mutator methods*. Assignment *always works the same exact way*. In any case, there is nothing stopping me from implementing my custom, immutable class with an `__iadd__` that does the same thing as the immutable `int` version – juanpa.arrivillaga Mar 24 '20 at 11:00
  • 1
    But to get back to the crux of the matter, `b = a` **never creates a copy**, regardless of the type of `a`. Note, *pass by reference of pass by value* are *evaluation strategies*, which concern *when and how the arguments to a function call are evaluated*. In Python, there is *only one evaluation strategy*, it is neither call by value nor call by reference, but rather, [call by sharing](https://en.wikipedia.org/wiki/Evaluation_strategy#Call_by_sharing). The issue here is the *semantics of assignment*. Python always uses references semantics for assignment.. – juanpa.arrivillaga Mar 24 '20 at 11:02
  • @juanpa.arrivillaga **Your example**: a = []; b = a; a = [42]; print(b) vs. **Gpoilo's example**: a=[]; b=a; a.append(1); print(b) yields totally different result. Now it really confused me... Do you have any simpler explanation for me as beginner? May be u can write below and i will accept that as the best answer, guess it would help many others. I'm sorry about my ignorance. – Jeremy Mar 24 '20 at 11:07
  • 1
    @Jeremy because `a = [42]` and `a.append(1)` *are different things completely*. one is an assignment statement. This *merely* re-binds the name on the left to the object on the right. `a.append(1)` is a method call, which is an expression that evaluates to a value, `None` in this case, and that is by convention because this method happens to be a *mutator* method, it *changes* the object. List objects can change, `tuple` objects cannot, you can only create new tuples. Really, you should just read [this](https://nedbatchelder.com/text/names.html) which is a great resource with clear diagrams – juanpa.arrivillaga Mar 24 '20 at 11:08

1 Answers1

6
df2 = df1     # here created a copy of df1 named as df2

The comment is not correct and may be the cause of your misunderstanding.

This line means: df2 is now another name for whatever is currently known as df1.

So if you change the object which is known as df1, you will also see this change when you refer to df2.

mkrieger1
  • 19,194
  • 5
  • 54
  • 65
  • wait a sec. if i do the following: a = 1 ; b = a ; a = 2 ; display(b) ; # b is still =1 right? even i change the value of a from 1 to 2 – Jeremy Mar 24 '20 at 10:52
  • 2
    By `a = 2` you don't *change* the object (the integer 1) which was known as `a`, you only give the name `a` to *another* object (the integer 2). – mkrieger1 Mar 24 '20 at 10:54
  • @Jeremy you aren't mutating the object referenced by `a`, you are **re-assigning `a`**. The same thing with dataframes: `a = pd.DataFrame([[1,2],[3,4]]); b = a; a = pd.DataFrame([[5,6],[7,8]]); print(b)` – juanpa.arrivillaga Mar 24 '20 at 10:55
  • @JosephChotard really, python's evaluation strategy isn't relevant here. This is about the semantics of assignment, not how function calls work. – juanpa.arrivillaga Mar 24 '20 at 10:55