I have a reference 2D numpy array R
with 50k rows and 1k columns, which is sorted along axis 0. Given a query 1D array x
, I can find the insert location of each value in x
into each column of R
with this code snippet:
R = np.array([[0, 8, 12, 2],
[3, 16, 13, 5],
[4, 19, 13, 11],
[6, 23, 16, 12]])
x = np.array([4, 9, 17, 6])
# Desired output:
# array([2, 1, 4, 2])
idx = (R < x).sum(axis = 0)
The problem is, now I have a list of query arrays, X
, where each row in X
is a query array, and X
can have as many rows as R
. What is the best way to approach this?
I tried the following:
R = np.array([[0, 8, 12, 2],
[3, 16, 13, 5],
[4, 19, 13, 11],
[6, 23, 16, 12]])
X = np.array([[4, 9, 17, 6],
[1, 2, 12, 20]])
# Desired output:
# array([[2, 1, 4, 2],
# [1, 0, 0, 4]])
# Approach 1 (requires too much memory):
idx = (R < X[:, None]).sum(axis = 1)
# Approach 2 (too slow):
idx = np.array([(R < x).sum(axis = 0) for x in X])
EDIT:
The searchsorted2d(a, b)
function in the accepted answer of this question does not solve the described problem. The input a is (m,n) 2D array and b is (m,) 1D array, whereas both of my arrays are 2D. It is possible to use searchsorted2d
in the following manner:
idx = np.array([searchsorted2d(R.T, x.T) for x in X])
But this approach would still be slow. Is there a vectorized, more general alternative to searchsorted2d
which would be more efficient?