0

I came across a code snippet where I could not understand two of the statements, though I could see the end result of each.

I will create a variable before giving the statements:

train = np.random.random((10,100))

One of them read as :

train = train[:-1, 1:-1]

What does this slicing mean? How to read this? I know that that -1 in slicing denotes from the back. But I cannot understand this.

Another statement read as follows:

la = [0.2**(7-j) for j in range(1,t+1)]
np.array(la)[:,None]

What does slicing with None as in [:,None] mean?

For the above two statements, along with how each statement is read, it will be helpful to have an alternative method along, so that I understand it better.

Amanda
  • 2,013
  • 3
  • 24
  • 57
  • Have you gone through the docs - https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#basic-slicing-and-indexing? That `None` is an alias for `np.newaxis`, generally used for broadcasting - https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html – Divakar Jun 01 '19 at 10:28
  • 1
    create small array with not-random data and then see results - and you will better see what you get. – furas Jun 01 '19 at 10:31

1 Answers1

2

One of Python's strengths is its uniform application of straightforward principles. Numpy indexing, like all indexing in Python, passes a single argument to the indexed object's (i.e., the array's) __getitem__ method, and numpy arrays were one of the primary justifications for the slicing mechanism (or at least one of its very early uses).

When I'm trying to understand new behaviours I like to start with a concrete and comprehensible example, so rather than 10x100 random values I'll start with a one-dimensional 4-element vector and work up to 3x4, which should be big enough to understand what's going on.

simple = np.array([1, 2, 3, 4])

train = np.array([[1, 2, 3, 4],
                   [5, 6, 7, 8],
                   [9, 10, 11, 12]])

The interpreter shows these as

array([1, 2, 3, 4])

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

The expression simple[x] is equivalent to (which is to say the interpreter ends up executing) simple.__getitem__(x) under the hood - note this call takes a single argument.

The numpy array's __getitem__ method implements indexing with an integer very simply: it selects a single element from the first dimension. So simple[1] is 2, and train[1] is array([5, 6, 7, 8]).

When __getitem__ receives a tuple as an argument (which is how Python's syntax interprets expressions like array[x, y, z]) it applies each element of the tuple as an index to successive dimensions of the indexed object. So result = train[1, 2] is equivalent (conceptually - the code is more complex in implementation) to

temp = train[1]    # i.e. train.__getitem__(1)
result = temp[2]   # i.e. temp.__getitem__(2)

and sure enough we find that result comes out at 7. You could think of array[x, y, z] as equivalent to array[x][y][z].

Now we can add slicing to the mix. Expressions containing a colon can be regarded as slice literals (I haven't seen a better name for them), and the interpreter creates slice objects for them. As the documentation notes, a slice object is mostly a container for three values, start, stop and slice, and it's up to each object's __getitem__ method how it interprets them. You might find this question helpful to understand slicing further.

With what you now know, you should be able to understand the answer to your first question.

result = train[:-1, 1:-1]

will call train.__getitem__ with a two-element tuple of slices. This is equivalent to

temp = train[:-1]
result = temp[..., 1:-1]

The first statement can be read as "set temp to all but the last row of train", and the second as "set result to all but the first and last columns of temp". train[:-1] is

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

and applying the [1:-1] subscripting to the second dimension of that array gives

array([[2, 3],
       [6, 7]])

The ellipsis on the first dimension of the temp subscript says "pass everything," so the subscript expression[...]can be considered equivalent to[:]. As far as theNonevalues are concerned, a slice has a maximum of three data points: _start_, _stop_ and _step_. ANonevalue for any of these gives the default value, which is0for _start_, the length of the indexed object for _stop_, and1for _step. Sox[None:None:None]is equivalent tox[0:len(x):1]which is equivalent tox[::]`.

With this knowledge under your belt you should stand a bit more chance of understanding what's going on.

holdenweb
  • 33,305
  • 7
  • 57
  • 77