If n=2
is quite small compared to a.shape[0]
, it might be worth while using this little function. The basic idea is calculate a mask that is just large enough to give the desired number of final rows. Here I do it iteratively. Normally iteration is slow, but if the number of iterations is small enough, the time saving elsewhere could be worth it.
def mask(a):
return a[:,0]>1
def paul_filter1(a,n):
# incremental w/ sum
j = a.shape[0]
for i in xrange(n,j+1):
am = mask(a[:i,:])
if np.sum(am)>=n:
j = i
break
return a[am,:]
Note that the mask am
can be shorter than the dimension it is working on. Effectively it pads the rest with False
. I haven't checked if this is documented.
In this small example, foo
is 3x slower than a[a[:,0]>1,:][:2,:]
.
But with a larger array, say a2=np.tile(a,[1000,1])
, the time with foo
remains the same, but the 'brute force' keeps slowing down as it has to apply the mask to more rows. Of course these timings do depend on where in a
the desire rows are located. There won't be any savings if foo
has to use nearly all of the rows.
edit
Addressing Bi Rico's concerns about the repeated np.sum
(even through that is fast compiled code), we could incrementally build the where
:
def paul_filter3(a,n):
# incremental adding index
j = a.shape[0]
am = mask(a[:n,:])
am = np.where(am)[0].tolist()
if len(am)<n:
for i in xrange(n,j):
if mask(a[[i],:]):
am.append(i)
if len(am)>=n:
break
am = np.array(am)
return a[am,:]
For small n
this is even faster.
Something closer to the original method, is to calculate the full mask, but then trim it. cumsum
can be used to find the minimum length.
def paul_filter4(a,n):
# cumsum
am = mask(a)
j = np.cumsum(am).searchsorted(n)
return a[am[:j+1],:]
Tested with the 1000x1000
random array of integers (1:12), times are (using 20 instead of 2, and tweaking the mask so more rows are False
In [172]: timeit paul_filter4(a,20)
1000 loops, best of 3: 690 us per loop
In [173]: timeit paul_filter3(a,20)
1000 loops, best of 3: 1.22 ms per loop
In [175]: timeit paul_filter1(a,20)
1000 loops, best of 3: 994 us per loop
In [176]: timeit rico_filter(a,20)
1000 loops, best of 3: 668 us per loop
In [177]: timeit marco_filter(a,20)
10 loops, best of 3: 21 ms per loop
rico_filter
using where
is fastest, but my alternative using cumsum
isn't far behind. The 3 incremental filters are similar in speed, about half of the fast ones.
In this a
as generated, and tested, most rows are True
. This is consistent with marco's
concern that the limit condition is a small subset of the logical condition. With these conditions Bi Rico's concern that paul_filter1
could blow up is unrealistic.
If I change the testing parameters, so all of the rows of a
have to be tested (a[:,0]>11
),
then the filters using where
and cumsum
take just as long as the original. The incremental filters are slower, by a factor of 15 or more. But my first attempt using np.sum
is fastest of that style.