2

I have a reference 2D numpy array R with 50k rows and 1k columns, which is sorted along axis 0. Given a query 1D array x, I can find the insert location of each value in x into each column of R with this code snippet:

R = np.array([[0, 8, 12, 2],
              [3, 16, 13, 5],
              [4, 19, 13, 11],
              [6, 23, 16, 12]])

x = np.array([4, 9, 17, 6])

# Desired output:
# array([2, 1, 4, 2])

idx = (R < x).sum(axis = 0)

The problem is, now I have a list of query arrays, X, where each row in X is a query array, and X can have as many rows as R. What is the best way to approach this?

I tried the following:

R = np.array([[0, 8, 12, 2],
              [3, 16, 13, 5],
              [4, 19, 13, 11],
              [6, 23, 16, 12]])

X = np.array([[4, 9, 17, 6],
              [1, 2, 12, 20]])

# Desired output:
# array([[2, 1, 4, 2],
#       [1, 0, 0, 4]])

# Approach 1 (requires too much memory):
idx = (R < X[:, None]).sum(axis = 1)


# Approach 2 (too slow):
idx = np.array([(R < x).sum(axis = 0) for x in X])

EDIT: The searchsorted2d(a, b) function in the accepted answer of this question does not solve the described problem. The input a is (m,n) 2D array and b is (m,) 1D array, whereas both of my arrays are 2D. It is possible to use searchsorted2d in the following manner:

idx = np.array([searchsorted2d(R.T, x.T) for x in X])

But this approach would still be slow. Is there a vectorized, more general alternative to searchsorted2d which would be more efficient?

0 Answers0