I ran into this problem when I was trying to make sure some properties of data frame's view.
Suppose I have a dataframe defined as: df = pd.DataFrame(columns=list('abc'), data=np.arange(18).reshape(6, 3))
and a view of this dataframe defined as: df1 = df.iloc[:3, :]
. We now have two dataframes as following:
print(df)
a b c
0 0 1 2
1 3 4 5
2 6 7 8
3 9 10 11
4 12 13 14
5 15 16 17
print(df1)
a b c
0 0 1 2
1 3 4 5
2 6 7 8
Now I want to output the id of a particular cell of these two dataframes:
print(id(df.loc[0, 'a']))
print(id(df1.loc[0, 'a']))
and I have the output as:
140114943491408
140114943491408
The weird thing is, if I continuously execute those two lines of 'print id' code, the ids change as well:
140114943491480
140114943491480
I have to emphasize that I did not execute the 'df definition' code when I execute those two 'print id' code, so the df and df1 are not redefined. Then, in my opinion, the memory address of each element in the data frame should be fixed, so how could the output changes?
A more weird thing happens when I keep executing those two lines of 'print id' codes. In some rare scenarios, those two ids even do not equal to each other:
140114943181088
140114943181112
But if I execute id(df.loc[0, 'a']) == id(df1.loc[0, 'a'])
at the same time, python still output True
. I know that since df1 is a view of df, their cells should share one memory, but how come the output of their ids could be different occasionally?
Those strange behaviors make me totally at lost. Could anyone explain those behaviors? Are they due to the characteristics of data frame or the id function in python? Thanks!
FYI, I am using Python 3.5.2
.