Approach #1
One way would be with np.isin
masking -
mask = np.isin(idx, indices, invert=True)
out = np.r_[indices, idx[mask]]
Approach #2 : Skipping the first argsort
Another with making those given indices minimum, thus forcing them to be at the start with argsorting
. We don't need idx
for this method as we are argsort-ing in our solution anyway -
def argsort_constrained(x, indices):
xc = x.copy()
xc[indices] = x.min()-np.arange(len(indices),0,-1)
return xc.argsort()
Benchmarking - Closer look
Let's study how does this entire thing of skipping the computation of starting argsort
idx
helps us with the second approach.
We will start off with the given sample :
In [206]: x = np.random.rand(1000)
In [207]: indices = np.array([10, 20, 30, 40, 50])
In [208]: %timeit argsort_constrained(x, indices)
38.6 µs ± 1.39 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [209]: idx = np.argsort(x)
In [211]: %timeit np.argsort(x)
27.7 µs ± 122 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [212]: %timeit in1d_masking(x, idx, indices)
...: %timeit isin_masking(x, idx, indices)
44.4 µs ± 421 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
50.7 µs ± 303 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Note that if you use np.concatenate
in place of np.r_
with these small datasets, you could do better.
So, argsort_constrained
has a total runtime cost of around 38.6 µs
, whereas the other two with masking have around 27.7 µs
on top of their individual timing numbers.
Let's scale up everything by 10x
and do the same experiments :
In [213]: x = np.random.rand(10000)
In [214]: indices = np.sort(np.random.choice(len(x), 50, replace=False))
In [215]: %timeit argsort_constrained(x, indices)
740 µs ± 3.13 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [216]: idx = np.argsort(x)
In [217]: %timeit np.argsort(x)
731 µs ± 14.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [218]: %timeit in1d_masking(x, idx, indices)
...: %timeit isin_masking(x, idx, indices)
1.07 ms ± 47.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.02 ms ± 4.02 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Again, the individual runtime costs with masking ones are higher than with argsort_constrained
. And this trend should continue as we scale up further.