4

In this example, r1 and r2 refers to the same object but assert(not r1 is r2) returns false because their id are different. However, I would expect it to fail since r1 and r2 refers to the same object!!

import pandas as pd
df = pd.DataFrame([0])
r1 = df.iloc[0]
r2 = df.iloc[0]
assert(not r1 is r2)
r1[0] = 1
assert(r1.equals(r2))
print(id(r1), id(r2))
>> 140547055257416 140547055258032

Explanations on why this happens can be found in array slicing in numpy

Chuan
  • 429
  • 5
  • 16
  • Use ’‘.copy()‘. As to why see https://stackoverflow.com/questions/27673231/why-should-i-make-a-copy-of-a-data-frame-in-pandas – Michael Gardner Nov 15 '20 at 06:31
  • I think you misunderstood my question. I have rephrased it. – Chuan Nov 15 '20 at 06:33
  • Sorry, this will answer your question. https://stackoverflow.com/questions/47972633/in-pandas-does-iloc-method-give-a-copy-or-view – Michael Gardner Nov 15 '20 at 06:37
  • An interesting read but it does not show me how to check if two variables refers to the same pandas object. – Chuan Nov 15 '20 at 06:41
  • you can call it a bug, but would you want a dict saving every iloc action you did to be saved so that the same ID will be given? – trigonom Nov 15 '20 at 07:02

2 Answers2

4

You can use np.may_share_memory or np.shares_memory here

np.may_share_memory(r1, r2)
# True
halfer
  • 19,824
  • 17
  • 99
  • 186
Ch3steR
  • 20,090
  • 4
  • 28
  • 58
1

First, let's do a simple experiment to see that r1 and r2 are actually the same objects in pandas's sense

import pandas as pd

df = pd.DataFrame([0,1,2,3])
r1 = df.iloc[:,:1]
r2 = df.iloc[:,:1]

r1.iloc[2] = -10
r2.iloc[1] = -100
assert (not r1 is r2)

print(pd.concat((df,r1,r2),axis=1).to_string())

running this script, the output is

     0    0    0
0    0    0    0
1 -100 -100 -100
2  -10  -10  -10
3    3    3    3

this means r1 and r2 are considered the same object by pandas.

In fact, by running this script

unique_ids = []
for _ in range(1000):
    one_id = id(df.iloc[:,:1])
    unique_ids.append(one_id)
set(unique_ids)

you will see the length of set(unique_ids) is not 1 !!

According to @user2357112 supports Monica's comment under this post

I don't think the ID you receive has any relation to the addresses of the array elements; it's the address of a header containing array metadata and a pointer to the storage used for the elements.

Basically, r1 and r2 are different objects referring to the same array elements.

meTchaikovsky
  • 7,478
  • 2
  • 15
  • 34
  • You have illustrated my problem without answering it. You showed that r1 r2 are considered the same object but id are different. – Chuan Nov 15 '20 at 08:08
  • @Chuan see my updated post, I guess `r1` and `r2` are headers containing array metadata. – meTchaikovsky Nov 15 '20 at 08:39
  • Thanks for the detailed explanation. I marked Ch3steR's answer as correct because it answered the question. – Chuan Nov 16 '20 at 05:08