3

What is happening here? It seems like the id locations of the array is not remaining steady maybe?is operator is returning False even thought the ids are same. then after printing the arrays the ids of elements are changing. Any explanations?

import numpy as np
a = np.arange(27)
b = a[1:5]
a[0] is b[1] #False
id(a[0]) #40038736L
id(b[1]) #40038736L
a #prints the array
id(b[1]) #40038712L
id(a[0]) #40038712L
b[0] #1
a[1] #1
id(b[0]) #40038712L
id(a[1]) #40038784L
figs_and_nuts
  • 4,870
  • 2
  • 31
  • 56
  • 2
    This may work with lists which contain pointers to objects. But an array stores values in a data buffer. `a[2]` is a new object with a reference to that buffer, but its `id` has nothing to do with `a` or the databuffer. So `id` is not useful when working with arrays. – hpaulj Oct 22 '16 at 21:07
  • Half-duplicate of http://stackoverflow.com/questions/3877230/why-does-id-id-and-id-id-in-cpython. I wish we had a "split duplicate" feature. – user2357112 Oct 22 '16 at 21:08

1 Answers1

2

First test with a list:

In [1109]: a=[0,1,2,3,4]
In [1112]: b=a[1:3]

In [1113]: id(a[1])
Out[1113]: 139407616
In [1114]: id(b[0])
Out[1114]: 139407616

In [1115]: a[1] is b[0]
Out[1115]: True

later I tried

In [1129]: id(1)
Out[1129]: 139407616

So the object in a[1] is consistently the integer 1 (id of integers is a bit tricky, and implementation dependent).

But with an array:

In [1118]: aa=np.arange(5)
In [1119]: ba=aa[1:]

In [1121]: aa[1]
Out[1121]: 1
In [1122]: ba[0]
Out[1122]: 1
In [1123]: id(aa[1])
Out[1123]: 2925837264
In [1124]: id(ba[0])
Out[1124]: 2925836912

id are totally different; in fact they change with each access:

In [1125]: id(aa[1])
Out[1125]: 2925837136
In [1126]: id(ba[0])
Out[1126]: 2925835104

That's because aa[1] isn't just the integer 1. It is a np.int32 object.

In [1127]: type(aa[1])
Out[1127]: numpy.int32

In contrast to a list, values of an array are stored as bytes in a databuffer. b[1:] is a view and accesses the same data buffer. But a[1] is a new object that contains a reference to that data buffer. In contrast to the list case, a[1] is not the 2nd object in a.

In general, id is not useful when working with arrays, and the is test is also not useful. Use == or isclose (for floats).

================

A way to see where the values of aa are stored is with:

In [1137]: aa.__array_interface__
Out[1137]: 
{'data': (179274256, False),      # 'id' so to speak of the databuffer
 'descr': [('', '<i4')],
 'shape': (5,),
 'strides': None,
 'typestr': '<i4',
 'version': 3}
In [1138]: ba.__array_interface__
Out[1138]: 
{'data': (179274260, False),    # this is 4 bytes larger
 'descr': [('', '<i4')],
 'shape': (4,),
 'strides': None,
 'typestr': '<i4',
 'version': 3}

the data pointer for the 2 arrays is related because ba is a view.

aa[1] is array-like, and too has a data buffer, but it isn't a view.

In [1139]: aa[1].__array_interface__
Out[1139]: 
{'__ref': array(1),
 'data': (182178952, False),
 ...}
hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • 1
    Would you mind putting a little more light on why the ids are changing with each access for aa[1]. Is it getting garbage collected and created afresh each time id on it is called? – figs_and_nuts Oct 22 '16 at 21:38
  • 1
    It's not a matter of garbage collection. `aa[1]` is produced by a method call, `aa.__getitem__((1))`. It is creating a new `np.int32` object each time. – hpaulj Oct 22 '16 at 22:03