6

today I used the numpy array for some calculation and found a strange problem, for example, assume i already imported numpy.arange in Ipython, and I run some scripts as follows:

In [5]: foo = arange(10)                                                      

In [8]: foo1 = foo[arange(3)]                                                 

In [11]: foo1[:] = 0                                                          

In [12]: foo
Out[12]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [16]: foo2 = foo[0:3]                                                      

In [19]: foo2[:]=0                                                            

In [21]: foo
Out[21]: array([0, 0, 0, 3, 4, 5, 6, 7, 8, 9])

above shows that when i slice the array by foo[arange(3)], i got a copy of the array slice, but when i slice the array by foo[0:3], i got a reference of the array slice, thus foo changes with foo2. Then I thought foo and foo2 should have the same id, but that seems is not true

In [59]: id(foo)
Out[59]: 27502608

In [60]: id(foo2)
Out[60]: 28866880

In [61]: id(foo[0])
Out[61]: 38796768

In [62]: id(foo2[0])
Out[62]: 38813248

...

even more strange, if I keep checking the id of foo and foo2, they are not constant, and sometimes, they did match each other!

In [65]: id(foo2[0])
Out[65]: 38928592

In [66]: id(foo[0])                                                          
Out[66]: 37111504

In [67]: id(foo[0])
Out[67]: 38928592

can anyone explain this a little bit? I am really confused by this dynamic feature of python

thanks alot

shelper
  • 10,053
  • 8
  • 41
  • 67

1 Answers1

5
foo[arange(3)]

is not a slice. The elements of arange(3) are used to select elements of foo to construct a new array. Since this can't efficiently return a view (every element of the view would have to be an independent reference, and operations on the view would require following far too many pointers), it returns a new array.

foo[0:3]

is a slice. This can be done efficiently as a view; it only requires adjusting some bounds. Thus, it returns a view.

id(foo[0])

foo[0] doesn't refer to a specific Python object. Keeping separate Python objects for every array element would be far too expensive, negating much of the benefit of numpy. Instead, when an indexing operation is performed on a numpy ndarray, numpy constructs a new object to return. You'll get a different object with a different ID every time.

user2357112
  • 260,549
  • 28
  • 431
  • 505
  • well, then why id(foo) is also different from id(foo2)? do they use the first element 's address as their address? – shelper Oct 25 '13 at 23:56
  • 1
    @shelper: foo isn't foo2. Although they have the same shape, dtype, etc., and although they use the same storage for their elements, they are different objects. I don't think the ID you receive has any relation to the addresses of the array elements; it's the address of a header containing array metadata and a pointer to the storage used for the elements. – user2357112 Oct 26 '13 at 00:09
  • 1
    well, i think i understand the issue, foo and foo2 are both well wrapped python object, id(foo) just show the address of the python object, not the memory that contains the data, which actually can be get by " foo.__array_interface__['data'] " – shelper Oct 26 '13 at 00:10
  • 2
    You can check if two arrays share the same base memory by comparing their `.base` attribute, i.e. in your case `foo.base is foo2.base` should evaluate to `True`. – Jaime Oct 26 '13 at 03:29
  • 1
    @Jamie - If I recall correctly, there can be cases where `foo.base` is not the same as `foo2.base` even though they share the same memory buffer. It holds true for all slicing operations, but not everywhere. Things that create a view through `__array_interface__` (for example `np.lib.stride_tricks.as_strided`) won't necessarily show the same base. As I understand it, `numpy.may_share_memory(foo, foo2)` is preferred way to check if two arrays share the same memory (though I could be wrong there). – Joe Kington Oct 26 '13 at 03:53