1

I'm trying to take a slice from a large numpy array as quickly as possible using fancy indexing. I would be happy returning a view, but advanced indexing returns a copy.

I've tried solutions from here and here with no joy so far.

Toy data:

data = np.random.randn(int(1e6), 50)
keep = np.random.rand(len(data))>0.5

Using the default method:

%timeit data[keep] 
10 loops, best of 3: 86.5 ms per loop

Numpy take:

%timeit data.take(np.where(keep)[0], axis=0)
%timeit np.take(data, np.where(keep)[0], axis=0)
10 loops, best of 3: 83.1 ms per loop
10 loops, best of 3: 80.4 ms per loop    

Method from here:

rows = np.where(keep)[0]
cols = np.arange(a.shape[1])
%timeit (a.ravel()[(cols + (rows * a.shape[1]).reshape((-1,1))).ravel()]).reshape(rows.size, cols.size)
10 loops, best of 3: 159 ms per loop

Whereas if you're taking a view of the same size:

%timeit data[1:-1:2, :]
1000000 loops, best of 3: 243 ns per loop
Community
  • 1
  • 1
dgmp88
  • 537
  • 6
  • 13

1 Answers1

6

There's no way to do this with a view. A view needs consistent strides, while your data is randomly scattered throughout the original array.

user2357112
  • 260,549
  • 28
  • 431
  • 505
  • Fair enough - I'm not necessarily tied to a view, any kind of speed up would be great. Thanks! – dgmp88 Aug 15 '16 at 20:01
  • 1
    Were you expecting something new, that wasn't covered in http://stackoverflow.com/questions/14386822/fast-numpy-fancy-indexing? – hpaulj Aug 15 '16 at 20:57
  • Yes, I was hoping for something different - those solutions work well in that case as they're slicing on both rows and columns, so the output matrix is much smaller. Here I'm only slicing on rows, and end up with a large output. – dgmp88 Aug 15 '16 at 22:10
  • My timings for `data[keep]` and `data[:500000].copy()` are in same ball park. So the time roughly scales with the number of items that have to be copied to the result. `data[::2,:].copy()` is even closer in time. `data[:,::2].copy()` a bit slower. – hpaulj Aug 16 '16 at 01:34