1

While trying to properly understand numpy indexing rules I stumbled across the following. I used to think that a trailing Ellipsis in an index does nothing. Trivial isn't it? Except, it's not actually true:

Python 3.5.2 (default, Nov 11 2016, 04:18:53) 
[GCC 4.8.5] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> 
>>> D2 = np.arange(4).reshape((2, 2))
>>>
>>> D2[[1, 0]].shape; D2[[1, 0], ...].shape
(2, 2)
(2, 2)
>>> D2[:, [1, 0]].shape; D2[:, [1, 0], ...].shape
(2, 2)
(2, 2)
>>> # so far so expected; now
... 
>>> D2[[[1, 0]]].shape; D2[[[1, 0]], ...].shape
(2, 2)
(1, 2, 2)
>>> # ouch!
...
>>> D2[:, [[1, 0]]].shape; D2[:, [[1, 0]], ...].shape
(2, 1, 2)
(2, 1, 2)

Now could someone in the know advise me as to whether this is a bug or a feature? And if the latter, what's the rationale?

Thanks in advance, Paul

Paul Panzer
  • 51,835
  • 3
  • 54
  • 99

1 Answers1

5

Evidently there's some ambiguity in the interpretation of the [[1, 0]] index. Possibly the same thing discussed here:

Advanced slicing when passed list instead of tuple in numpy

I'll try a different array, to see if it makes things any clear

In [312]: D2=np.array([[0,0],[1,1],[2,2]])
In [313]: D2
Out[313]: 
array([[0, 0],
       [1, 1],
       [2, 2]])

In [316]: D2[[[1,0,0]]]
Out[316]: 
array([[1, 1],
       [0, 0],
       [0, 0]])
In [317]: _.shape
Out[317]: (3, 2)

Use of : or ... or making the index list an array, all treat it as a (1,3) index, and expand the dimensions of the result accordingly

In [318]: D2[[[1,0,0]],:]
Out[318]: 
array([[[1, 1],
        [0, 0],
        [0, 0]]])
In [319]: _.shape
Out[319]: (1, 3, 2)
In [320]: D2[np.array([[1,0,0]])]
Out[320]: 
array([[[1, 1],
        [0, 0],
        [0, 0]]])
In [321]: _.shape
Out[321]: (1, 3, 2)

Note that if I apply transpose to the indexing array I get a (3,1,2) result

In [323]: D2[np.array([[1,0,0]]).T,:]
...
In [324]: _.shape
Out[324]: (3, 1, 2)

Without : or ..., it appears to strip off one layer of [] before applying it to the 1st axis:

In [330]: D2[[1,0,0]].shape
Out[330]: (3, 2)
In [331]: D2[[[1,0,0]]].shape
Out[331]: (3, 2)
In [333]: D2[[[[1,0,0]]]].shape
Out[333]: (1, 3, 2)
In [334]: D2[[[[[1,0,0]]]]].shape
Out[334]: (1, 1, 3, 2)
In [335]: D2[np.array([[[[1,0,0]]]])].shape
Out[335]: (1, 1, 1, 3, 2)

I think there's a backward compatibility issue here. We know that the tuple layer is 'redundant': D2[(1,2)] is the same as D2[1,2]. But for compatibility for early versions of numpy (numeric) that first [] layer may be treated in the same way.

In that November question, I noted:

So at a top level a list and tuple are treated the same - if the list can't interpreted as an advanced indexing list.

The addition of a ... is another way of separating the D2[[[0,1]]] from D2[([0,1],)].

From @eric/s pull request seburg explains

 The tuple normalization is a rather small thing (it basically checks for a non-array sequence of length <= np.MAXDIMS, and if it contains another sequence, slice or None consider it a tuple).

[[1,2]] is a 1 element list with a list, so it is considered a tuple, i.e. ([1,2],). [[1,2]],... is a tuple already.

Community
  • 1
  • 1
hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • Thanks for your helpful answer and the indeed highly relevant link. Do you have any advice as how to, say, parse an index intercepted with, say, a reimplementation of __getitem__ without the constant fear of missing yet another corner case? And, on the off-chance, do you happen to know whether there are any plans to sanitise this kind of behaviour in a future numpy version? (There are ways to trigger deprecation warnings, so the indexing logic cannot be set in stone.) – Paul Panzer Dec 20 '16 at 03:41
  • I haven't thought about your `reimplementation of getitem` case, so it's hard to imagine the potential problems. `np.lib.index_tricks` has a several classes with custom `__get_item__` methods, but they don't pass the results on to an array. `np.apply_along_axis` is a good example of creating a complicated indexing object, first as an array, and then applying it with `x[tuple(i.tolist())]`. – hpaulj Dec 20 '16 at 04:57
  • Ok, here's a made-up use case (and apologies if that's straying to far from the original question): imagine I wanted to subclass ndarray to allow the user to put labels on axes (such as "time", "elevation", ...). Then my customised `__getitem__` would have to understand which axes of the array disappear, are shuffled, merged or newly created to be able to do the appropriate thing to the labels or raise an exception if there is no good solution. So to link back to the original question it should certainly know whether to expect (1, 2, 2) or (2, 2). – Paul Panzer Dec 20 '16 at 06:53
  • Hm, sadly, I do not really follow your `apply_along_axis` suggestion. Could you point me to an example? – Paul Panzer Dec 20 '16 at 07:14
  • 2
    _"Do you have any advice as how to, say, parse an index intercepted with, say, a reimplementation of getitem without the constant fear of missing yet another corner case"_ - there's some discussion of a patch for this [here](https://github.com/numpy/numpy/pull/8276) – Eric Dec 20 '16 at 12:08
  • So the key is a step summarized as `tuple normalization`. – hpaulj Dec 20 '16 at 13:16
  • @Eric Do I understand [this](https://github.com/numpy/numpy/pull/4434) correctly? You are actually right now working on changing numpy and eventually getting rid of this whole sometimes-treating-lists-as-if-they-were-tuples business? That's a funny coincidence! But most welcome. – Paul Panzer Dec 20 '16 at 16:05
  • 1
    Looks like the immediate goal is to refactor the indexing so this tuple normalization occurs in one C function, and to expose it as a Python function that you can include in your own `getitem`. Deprecation is likely to be a long ways off. – hpaulj Dec 20 '16 at 17:05
  • 1
    @PaulPanzer: I wouldn't say I'm actively working on it. Both PRs have sort of stagnated – Eric Dec 21 '16 at 11:48