How to vectorize with numpy.searchsorted in a 2d-array

Question

I have a 2d-array (a) for lookup and an array (v) to find indices where elements should be inserted:

import numpy as np

# [EDIT] Add more records which contain NaNs
a = np.array(
[[0., 923.9943, 996.8978, 1063.9064, 1125.639, 1184.3985, 1259.9854, 1339.6107, 1503.4462, 2035.6527],
 [0., 1593.6196, 1885.2442, 2152.956, 2419.0038, 2843.517, 3551.225, 5423.009, 18930.8694, 70472.4002],
 [0., 1593.6196, 1885.2442, 2152.956, 2419.0038, 2843.517, 3551.225, 5423.009, 18930.8694, 70472.4002],
 [0., 1084.8388, 1132.6918, 1172.2278, 1215.7986, 1259.062, 1334.4778, 1430.738, 1650.4502, 3966.1578],
 [0., 1084.8388, 1132.6918, 1172.2278, 1215.7986, 1259.062, 1334.4778, 1430.738, 1650.4502, 3966.1578],
 [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
 [0., 923.9943, 996.8978, 1063.9064, 1125.639, 1184.3985, 1259.9854, 1339.6107, 1503.4462, 2035.6527],
 [0., 1593.6196, 1885.2442, 2152.956, 2419.0038, 2843.517, 3551.225, 5423.009, 18930.8694, 70472.4002],
 [0., 1593.6196, 1885.2442, 2152.956, 2419.0038, 2843.517, 3551.225, 5423.009, 18930.8694, 70472.4002],
 [0., 1084.8388, 1132.6918, 1172.2278, 1215.7986, 1259.062, 1334.4778, 1430.738, 1650.4502, 3966.1578],
 [0., 1084.8388, 1132.6918, 1172.2278, 1215.7986, 1259.062, 1334.4778, 1430.738, 1650.4502, 3966.1578]])

v = np.array([641.954, 56554.498, 168078.307, 1331.692, 2233.327, 1120.03, 641.954, 56554.498, 168078.307, 1331.692, 2233.327])

This is the result I want to get:

[1, 9, 10, 6, 9, 0, 1, 9, 10, 6, 9]

Obviously, with a for loop I can index the array a and v like this:

for i, _ in enumerate(a):
    print(np.searchsorted(a[i], v[i]))

Are there any vectorized ways to do this which are more efficient?

So, it could have entire rows as all NaNs? Could it also have few elements and not all elements in a row as NaNs? — Divakar, Jun 07 '19 at 08:54
@Divakar No possibilities of few NaNs in a row. If it has NaNs, entire rows are NaNs. — kokon, Jun 11 '19 at 04:44

Divakar · Accepted Answer · 2019-06-11T06:18:04.570

6

Inspired by Vectorized searchsorted numpy for the underlying idea, here's one between 2D and 1D arrays -

def searchsorted2d(a,b):
    # Inputs : a is (m,n) 2D array and b is (m,) 1D array.
    # Finds np.searchsorted(a[i], b[i])) in a vectorized way by
    # scaling/offsetting both inputs and then using searchsorted

    # Get scaling offset and then scale inputs
    s = np.r_[0,(np.maximum(a.max(1)-a.min(1)+1,b)+1).cumsum()[:-1]]
    a_scaled = (a+s[:,None]).ravel()
    b_scaled = b+s

    # Use searchsorted on scaled ones and then subtract offsets
    return np.searchsorted(a_scaled,b_scaled)-np.arange(len(s))*a.shape[1]

Output on given sample -

In [101]: searchsorted2d(a,v)
Out[101]: array([ 1,  9, 10,  6,  9])

Case with all NaNs rows

To extend to make it work for all NaNs rows, we need few more steps -

valid_mask = ~np.isnan(a).any(1)
out = np.zeros(len(a), dtype=int)
out[valid_mask] = searchsorted2d(a[valid_mask],v[valid_mask])

edited Jun 11 '19 at 06:18

answered Jun 06 '19 at 06:46

Divakar

218,885
19
262
358

Thanks a lot. It's a great idea and worked well with the sample, but with the real data which contains NaNs, some issues came up. Could you give me an idea to handle this? – kokon Jun 07 '19 at 01:19
@kokon Please post sample data, the expected output and hopefully a working solution with loops even. – Divakar Jun 07 '19 at 04:53
Since there's a limited number of characters in this comment, I updated the sample in the original post above to include NaN records. Currently with your solution, the data after NaN records returns negative numbers, which are different from the expected output: array([ 1, 9, 10, 6, 9, 0, -10, -20, -30, -40, -50]) – kokon Jun 07 '19 at 07:28

How to vectorize with numpy.searchsorted in a 2d-array

1 Answers1

Linked

Related