67

I know how to slice 1-dimensional sequence: arr[start:end], and access an element in the array: el = arr[row][col].

Now, I'm trying something like slice = arr[0:2][0:2] (where arr is a numpy array) but it doesn't give me the first 2 rows and columns, but repeats the first 2 rows. What did I just do, and how do I slice along another dimension?

cottontail
  • 10,268
  • 18
  • 50
  • 51
SlightlyCuban
  • 3,185
  • 1
  • 20
  • 31

2 Answers2

97

If you use numpy, this is easy:

slice = arr[:2,:2]

or if you want the 0's,

slice = arr[0:2,0:2]

You'll get the same result.

*note that slice is actually the name of a builtin-type. Generally, I would advise giving your object a different "name".


Another way, if you're working with lists of lists*:

slice = [arr[i][0:2] for i in range(0,2)]

(Note that the 0's here are unnecessary: [arr[i][:2] for i in range(2)] would also work.).

What I did here is that I take each desired row 1 at a time (arr[i]). I then slice the columns I want out of that row and add it to the list that I'm building.

If you naively try: arr[0:2] You get the first 2 rows which if you then slice again arr[0:2][0:2], you're just slicing the first two rows over again.

*This actually works for numpy arrays too, but it will be slow compared to the "native" solution I posted above.

mgilson
  • 300,191
  • 65
  • 633
  • 696
  • So, what does `arr[0:2][0:2]` actually do compared to `arr[row][col]`? – SlightlyCuban Jun 24 '13 at 13:53
  • 2
    @mgilson I tired this and got some rather strange result. Suppose I had `a=[[1,2,3],[4,5,6],[7,8,9]]` then `a[0][:]=[1, 2, 3]` but strangely enough `a[:][0]` is still `[1,2,3]`. I would say that this should be `[1,4,7]`. I would appreciate if you could tell me what is wrong here. – Alexander Cska Apr 13 '16 at 18:48
  • 4
    @AlexanderCska That's not that strange actually. Those two expressions are basically "Take the first element of `a` and return a copy of it" and "Copy `a` and return the first element of the copy". The fix is `[x[0] for x in a]`, or, if you're working with `numpy`, `a[:, 0]` – mgilson Apr 13 '16 at 19:46
  • 1
    @mgilson OK, I was trying to apply FORTRAN notation to python and I see now that things are a bit more involved. – Alexander Cska Apr 16 '16 at 21:26
  • @AlexanderCska -- If you are using `numpy`, they're actually remarkably similar... Without numpy, it's quite different (more C-like in the multidimensional array is an "array of pointers to arrays"). – mgilson Apr 17 '16 at 03:10
  • 1
    @mgilson I get it now, under normal circumstances `a[:]` is a copy and is used to avoid having multiple pointers referencing the same memory location. I just confused `C` and python syntax. – Alexander Cska May 11 '16 at 09:24
0

To slice a multi-dimensional array, the dimension (i.e. axis) must be specified. As OP noted, arr[i:j][i:j] is exactly the same as arr[i:j] because arr[i:j] sliced along the first axis (rows) and has the same number of dimensions as arr (you can confirm by arr[i:j].ndim == arr.ndim); so the second slice is still slicing along the first dimension (which was already done by the first slice). To slice along the second dimension, it must be explicitly specified, e.g.:

arr[:2][:, :2]                   # its output is the same as `arr[:2, :2]`

A bare : means slice everything in that axis, so there's an implicit : for the second axis in the above code (i.e. arr[:2, :][:, :2]). What the above code is doing is slicing the first two rows (or first two arrays along the first axis) and then slice the first two columns (or the first two arrays along the second axis) from the resulting array.

An ... can be used instead of multiple colons (:), so for a general n-dimensional array, the following produce the same output:

w = arr[i:j, m:n]
x = arr[i:j, m:n, ...]
y = arr[i:j][:, m:n]
z = arr[i:j, ...][:, m:n, ...]

That said, arr[:2, :2] is the canonical way because in the case of arr[i:j][:, i:j], arr[i:j] creates a temporary array which is indexed by [:, i:j], so it's comparatively inefficient.

However, there are cases where chained indexing makes sense (or readable), e.g., if you want to index a multi-dimensional array using a list of indices. For example, if you want to slice the top-left quarter of a 4x4 array using a list of indices, then chained indexing gives the correct result whereas a single indexing gives a different result (it's because of numpy advanced indexing) where the values correspond to the index pair for each position in the index lists.

arr = np.arange(1,17).reshape(4,4)
rows = cols = [0,1]
arr[rows][:, cols]               # <--- correct output
arr[rows, cols]                  # <--- wrong output
arr[[[e] for e in rows], cols]   # <--- correct output
arr[np.ix_(rows, cols)]          # <--- correct output
cottontail
  • 10,268
  • 18
  • 50
  • 51