0

It is a relatively simple problem that is driving me crazy;

When I try to sort two data frames below df1 and df2 in a for loop, it doesn't give any error. However, when I print them, they are not sorted at all.

d={"a":[1,6,2,4,3],
   "b":[1,2,3,4,5]}

k={"a":[7,12,8,11,9],
   "d":[1,2,3,4,5]}

df1=pd.DataFrame(d)

df2=pd.DataFrame(k)

all_data=[df1,df2]

for data in all_data:
    data=data.sort_values(by=["a"])

But, when you add inplace parameter, it saves the change. I thought using inplace=True parameter and assigning to the "data" variable was equivalent. Can you help me understand the logic behind this?

  • "I thought using inplace=True parameter and assigning to the "data" variable was equivalent." No, it isn't. One mutates the object, so all other references to the object will see the change. The other creates a new object and assigns it to the variable. They are effectively the same if that variable is the only reference to the object, but in this case it isn't (the objects are being referenced by the list, which is what you actually want to change). – juanpa.arrivillaga May 07 '20 at 17:34
  • I know this is the answer and I know you said it as clear as possible. However, I am still having difficulty understanding it. Can you tell at least which one mutates and which one creates new variable? Also what are the two references that are confused ? I appreciate it! – Kamuran T. May 07 '20 at 21:13
  • Neither of them "create a new variable". When you use `inplace=True` then `.sort_values ` *mutates* the dataframe object it was called on. If you don't, it *returns a new dataframe object*. Again, it's very important to understand, *no new variables are created*, but *new objects are created*. Of course, if *you* create a new variable, `x = whatever; new_x = x.something()` then a new variable is created, but that is not directly related – juanpa.arrivillaga May 07 '20 at 21:14

0 Answers0