Numpy array indexing behavior

Question

I was playing with numpy array indexing and find this odd behavior. When I index with np.array or list it works as expected:

 In[1]: arr = np.arange(10).reshape(5,2)
        arr[ [1, 1] ]
Out[1]: array([[2, 3],
               [2, 3]])

But when I put tuple, it gives me a single element:

 In[1]: arr = np.arange(10).reshape(5,2)
        arr[ (1, 1) ]
Out[1]: 3

Also some kind of this strange tuple vs list behavior occurs with arr.flat:

 In[1]: arr = np.arange(10).reshape(5,2)

 In[2]: arr.flat[ [3, 4] ]
Out[2]: array([3, 4])

 In[3]: arr.flat[ (3, 4) ]
Out[3]: IndexError: unsupported iterator index

I can't understand what is going on under the hood? What difference between tuple and list in this case?

Python 3.5.2
NumPy 1.11.1

Andras Deak -- Слава Україні · Accepted Answer · 2016-12-20T15:23:15.413

What's happening is called fancy indexing, or advanced indexing. There's a difference between indexing with slices, or with a list/array. The trick is that multidimensional indexing actually works with tuples due to the implicit tuple syntax:

import numpy as np
arr = np.arange(10).reshape(5,2)
arr[2,1] == arr[(2,1)] # exact same thing: 2,1 matrix element

However, using a list (or array) inside an index expression will behave differently:

arr[[2,1]]

will index into arr with 1, then with 2, so first it fetches arr[2]==arr[2,:], then arr[1]==arr[1,:], and returns these two rows (row 2 and row 1) as the result.

It gets funkier:

print(arr[1:3,0:2])
print(arr[[1,2],[0,1]])

The first one is regular indexing, and it slices rows 1 to 2 and columns 0 to 1 inclusive; giving you a 2x2 subarray. The second one is fancy indexing, it gives you arr[1,0],arr[2,1] in an array, i.e. it indexes selectively into your array using, essentially, the zip() of the index lists.

Now here's why flat works like that: it returns a flatiter of your array. From help(arr.flat):

class flatiter(builtins.object)
 |  Flat iterator object to iterate over arrays.
 |  
 |  A `flatiter` iterator is returned by ``x.flat`` for any array `x`.
 |  It allows iterating over the array as if it were a 1-D array,
 |  either in a for-loop or by calling its `next` method.

So the resulting iterator from arr.flat behaves as a 1d array. When you do

arr.flat[ [3, 4] ]

you're accessing two elements of that virtual 1d array using fancy indexing; it works. But when you're trying to do

arr.flat[ (3,4) ]

you're attempting to access the (3,4) element of a 1d (!) array, but this is erroneous. The reason that this doesn't throw an IndexError is probably only due to the fact that arr.flat itself handles this indexing case.

Thank you for `tuple` insight, I really never thought about `tuple`'s implicit nature. And in the case of `flat`, I think it will be better if it raised: `too many indexes` because it is clearer. — godaygo, Dec 20 '16 at 19:14

score 1 · Answer 2 · edited May 23 '17 at 12:24

In [387]: arr=np.arange(10).reshape(5,2)

With this list, you are selecting 2 rows from arr

In [388]: arr[[1,1]]
Out[388]: 
array([[2, 3],
       [2, 3]])

It's the same as if you explicitly marked the column slice (with : or ...)

In [389]: arr[[1,1],:]
Out[389]: 
array([[2, 3],
       [2, 3]])

Using an array instead of a list works: arr[np.array([1,1]),:]. (It also eliminates some ambiguities.)

With the tuple, the result is the same as if you wrote the indexing without the tuple wrapper. So it selects an element with row index of 1, column index of 1.

In [390]: arr[(1,1)]
Out[390]: 3
In [391]: arr[1,1]
Out[391]: 3

The arr[1,1] is translated by the interpreter to arr.__getitem__((1,1)). As is common in Python 1,1 is shorthand for (1,1).

In the arr.flat cases you are indexing the array as if it were 1d. np.arange(10)[[2,3]] selects 2 items, while np.arange(10)[(2,3)] is 2d indexing, hence the error.

A couple of recent questions touch on a messier corner case. Sometimes the list is treated as a tuple. The discussion might be enlightening, but don't go there if it's confusing.

Advanced slicing when passed list instead of tuple in numpy

numpy indexing: shouldn't trailing Ellipsis be redundant?

Numpy array indexing behavior

2 Answers2