3

I am comparing two elements of a numpy array. The memory address obtained by id() function for both elements are different. Also the is operator gives out that the two elements are not same.

However if I compare memory address of the two array elements using == operator it gives out that the two elements are same.

I am not able to understand how the == operator gives output as True when the two memory address are different.

Below is my code.

import numpy as np

a = np.arange(8)
newarray = a[np.array([3,4,2])]

print("Initial array : ", a)
print("New array : ", newarray)

# comparison of two element using 'is' operator
print("\ncomparison using is operator : ",a[3] is newarray[0])

# comparison of memory address of two element using '==' operator
print("comparison using == opertor : ", id(a[3]) == id(newarray[0]))

# memory address of both elements of array
print("\nMemory address of a : ", id(a[3]))
print("Memory address of newarray : ", id(newarray[0]))

Output:

Initial array : [0 1 2 3 4 5 6 7]
New array : [3 4 2]

comparison using is operator : False
comparison using == operator : True

Memory address of a : 2807046101296
Memory address of newarray : 2808566470576

DuDa
  • 3,718
  • 4
  • 16
  • 36
Rahul Kumbhar
  • 69
  • 1
  • 3

2 Answers2

2

This is probably due to a combination of Python's integer caching and obscure implemetation details of numpy.

If you slightly change the code you will see that the ids are not consistent during the flow of the code, but they are actually the same on each line:

import numpy as np

a = np.arange(8)
newarray = a[np.array([3,4,2])]
print(id(a[3]), id(newarray[0]))
print(id(a[3]), id(newarray[0])) 

outputs

276651376 276651376
20168608 20168608
DeepSpace
  • 78,697
  • 11
  • 109
  • 154
1

A numpy array does not store references to objects like a list (unless it is object dtype). It has a 1d databuffer with the numeric values, which it may access in various ways.

In [17]: a = np.arange(8)
    ...: newarray = a[np.array([3,4,2])]
In [18]: a
Out[18]: array([0, 1, 2, 3, 4, 5, 6, 7])
In [21]: newarray
Out[21]: array([3, 4, 2])

newarray, produced with advanced indexing is not a view. It has its own databuffer and values.

Let's 'unbox' elements of these arrays, assigning them to variables.

In [22]: x = a[3]; y = newarray[0]
In [23]: x
Out[23]: 3
In [24]: y
Out[24]: 3
In [25]: id(x),id(y)
Out[25]: (139768142922768, 139768142925584)

id are different (the assignment prevents the possibly confusing recycling of ids).

id are different, so is is False:

In [26]: x is y
Out[26]: False

but values are the same (by == test)

In [27]: x == y
Out[27]: True

Another 'unboxing', different id:

In [28]: w = a[3]
In [29]: w
Out[29]: 3
In [30]: id(w)
Out[30]: 139768133495504

These integers are actually np.int64 objects. Python does 'cache' small integers, but that does not apply here.

In [33]: type(x)
Out[33]: numpy.int64

Where can see "where" the arrays store their data:

In [31]: a.__array_interface__['data']
Out[31]: (33696480, False)
In [32]: newarray.__array_interface__['data']
Out[32]: (33838848, False)

These are totally different buffers. If newarray was a view the buffer pointers would be the same or nearby.

If we don't hang on to the indexed object, ids may be reused:

In [34]: id(newarray[0]), id(newarray[0])
Out[34]: (139768133493520, 139768133493520)

In general is and id are not useful when working with numpy arrays.

hpaulj
  • 221,503
  • 14
  • 230
  • 353