Fast numpy fancy indexing

Question

My code for slicing a numpy array (via fancy indexing) is very slow. It is currently a bottleneck in program.

a.shape
(3218, 6)

ts = time.time(); a[rows][:, cols]; te = time.time(); print('%.8f' % (te-ts));
0.00200009

What is the correct numpy call to get an array consisting of the subset of rows 'rows' and columns 'col' of the matrix a? (in fact, I need the transpose of this result)

`time.time` isn't a great way to measure timing. Generally, it's better to use `timeit` instead. — mgilson, Jan 17 '13 at 19:52
1. What is your program doing exactly? 2. Use a proper python profiler. I find it quite unlikely that slicing is your bottleneck — mbatchkarov, Jan 17 '13 at 19:57
If the slicing is the problem than you could consider using a view instead — Wolph, Jan 17 '13 at 20:00
mgilson: I know about timeit, but since this call is deep within a code and I didn't know how to pass in local variable values (`a` in this case). I don't think it's equivalent to `a[rows,cols]` if rows and cols are not consecutive ranges. Right? mbatchkarov: I found this bottleneck by profiling using `cProfile` and `pstats`. My program manipulates large arrays for a genetics application. Thanks both for your feedback. — Oren, Jan 17 '13 at 20:01
I should mention I only need the sub-array for reading, not writing. So I'll look into a view. thx — Oren, Jan 17 '13 at 20:02
Oren - If you use `@mgilson` style mentions it will send a notification to the user (one per comment). — agf, Jan 17 '13 at 20:10
@mgilson: I remember having issues with that in some cases (4 years ago though), might not apply anymore. The manual says the following `For all cases of index arrays, what is returned is a copy of the original data, not a view as one gets for slices` http://docs.scipy.org/doc/numpy/user/basics.indexing.html?highlight=slice#index-arrays — Wolph, Jan 18 '13 at 08:39
@Wolph It is still true for [Numpy 1.15](https://docs.scipy.org/doc/numpy-1.15.1/reference/arrays.indexing.html): _Advanced indexing always returns a copy of the data (contrast with basic slicing that returns a view)_ — bartolo-otrit, Oct 04 '18 at 06:53

score 21 · Answer 1 · edited Mar 30 '23 at 12:24

Let my try to summarize the excellent answers by Jaime and TheodrosZelleke and mix in some comments.

Advanced (fancy) indexing always returns a copy, never a view.
a[rows][:,cols] implies two fancy indexing operations, so an intermediate copy a[rows] is created and discarded. Handy and readable, but not very efficient. Moreover beware that [:,cols] usually generates a Fortran contiguous copy from a C-cont. source.
a[rows.reshape(-1,1),cols] is a single advanced indexing expression basing on the fact that rows.reshape(-1,1) and cols are broadcast to the shape of the intended result.
A common experience is that indexing in a flattened array can be more efficient than fancy indexing, so another approach is
```
indx = rows.reshape(-1,1)*a.shape[1] + cols
a.take(indx)
```

or

    a.take(indx.flat).reshape(rows.size,cols.size)

Efficiency will depend on memory access patterns and whether the starting array is C-countinous or Fortran continuous, so experimentation is needed.
Use fancy indexing only if really needed: basic slicing a[rstart:rstop:rstep, cstart:cstop:cstep] returns a view (although not continuous) and should be faster!

tzelleke · Answer 2 · 2013-01-17T21:06:42.277

To my surprise this, kind of lenghty expression, which calculates first linear 1D-indices, is more than 50% faster than the consecutive array indexing presented in the question:

(a.ravel()[(
   cols + (rows * a.shape[1]).reshape((-1,1))
   ).ravel()]).reshape(rows.size, cols.size)

UPDATE: OP updated the description of the shape of the initial array. With the updated size the speedup is now above 99%:

In [93]: a = np.random.randn(3218, 1415)

In [94]: rows = np.random.randint(a.shape[0], size=2000)

In [95]: cols = np.random.randint(a.shape[1], size=6)

In [96]: timeit a[rows][:, cols]
10 loops, best of 3: 186 ms per loop

In [97]: timeit (a.ravel()[(cols + (rows * a.shape[1]).reshape((-1,1))).ravel()]).reshape(rows.size, cols.size)
1000 loops, best of 3: 1.56 ms per loop

INITAL ANSWER: Here is the transcript:

In [79]: a = np.random.randn(3218, 6)
In [80]: a.shape
Out[80]: (3218, 6)

In [81]: rows = np.random.randint(a.shape[0], size=2000)
In [82]: cols = np.array([1,3,4,5])

Time method 1:

In [83]: timeit a[rows][:, cols]
1000 loops, best of 3: 1.26 ms per loop

Time method 2:

In [84]: timeit (a.ravel()[(cols + (rows * a.shape[1]).reshape((-1,1))).ravel()]).reshape(rows.size, cols.size)
1000 loops, best of 3: 568 us per loop

Check that results are actually the same:

In [85]: result1 = a[rows][:, cols]
In [86]: result2 = (a.ravel()[(cols + (rows * a.shape[1]).reshape((-1,1))).ravel()]).reshape(rows.size, cols.size)

In [87]: np.sum(result1 - result2)
Out[87]: 0.0

It is also about twice as fast as the more standard answer I provided, for the new requirement of the OP. Nice! — Jaime, Jan 17 '13 at 21:08
Not to say that such tricks can't speed things up (at least in specific cases), but all of this heavily relies on the fact that the input array is C-Contiguous. — seberg, Jan 17 '13 at 22:09
No surprise: see this [answer](http://stackoverflow.com/a/11813040/1499402) to a related question. — Stefano M, Jan 18 '13 at 10:24

Jaime · Accepted Answer · 2013-01-17T20:35:19.483

You can get some speed up if you slice using fancy indexing and broadcasting:

from __future__ import division
import numpy as np

def slice_1(a, rs, cs) :
    return a[rs][:, cs]

def slice_2(a, rs, cs) :
    return a[rs[:, None], cs]

>>> rows, cols = 3218, 6
>>> rs = np.unique(np.random.randint(0, rows, size=(rows//2,)))
>>> cs = np.unique(np.random.randint(0, cols, size=(cols//2,)))
>>> a = np.random.rand(rows, cols)
>>> import timeit
>>> print timeit.timeit('slice_1(a, rs, cs)',
                        'from __main__ import slice_1, a, rs, cs',
                        number=1000)
0.24083110865
>>> print timeit.timeit('slice_2(a, rs, cs)',
                        'from __main__ import slice_2, a, rs, cs',
                        number=1000)
0.206566124519

If you think in term of percentages, doing something 15% faster is always good, but in my system, for the size of your array, this is taking 40 us less to do the slicing, and it is hard to believe that an operation taking 240 us will be your bottleneck.

Turns out I have a 3218x1415 array, not 3218x6. I am extracting only a few columns and lots of rows. The code above reveals that slice_1 call time is 0.08 sec, slice_2 time 0.0004 sec. Maybe that's what I need! — Oren, Jan 17 '13 at 20:48

score 1 · Answer 4 · answered Mar 29 '18 at 18:51

Using np.ix_ you can a similar speed to ravel/reshape, but with code that is more clear:

a = np.random.randn(3218, 1415)
rows = np.random.randint(a.shape[0], size=2000)
cols = np.random.randint(a.shape[1], size=6)
a = np.random.randn(3218, 1415)
rows = np.random.randint(a.shape[0], size=2000)
cols = np.random.randint(a.shape[1], size=6)

%timeit (a.ravel()[(cols + (rows * a.shape[1]).reshape((-1,1))).ravel()]).reshape(rows.size, cols.size)
#101 µs ± 2.36 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


%timeit ix_ = np.ix_(rows, cols); a[ix_]
#135 µs ± 7.47 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

ix_ = np.ix_(rows, cols)
result1 = a[ix_]
result2 = (a.ravel()[(cols + (rows * a.shape[1]).reshape((-1,1))).ravel()]).reshape(rows.size, cols.size)

np.sum(result1 - result2)
0.0

Fast numpy fancy indexing

4 Answers4

Linked