0

In other words, I need to define the index correspondence between equal unique numbers inside two unsorted 2d arrays. Similar questions:

  1. how to find indices of a 2d numpy array occuring in another 2d array - not about single values, but rows/columns

  2. test for membership in a 2d numpy array - not about single values, but rows/columns

  3. Pythonic way of finding indexes of unique elements in two arrays, 1d sorted arrays

  4. Finding the indexed location of values in a unsorted numpy array from data in another unsorted numpy array is about 1d unsorted arrays

There are two 2d arrays with unique numbers, say: x = [[45, 67], [32, 52], [94, 64], [21, 90]], and y = [[67, 103, 12], [2, 61, 77], [70, 94, 18]]. The numbers 67, 94 are common for these two lists.

Is there a faster solution to get the index correspondence like: [[[0, 1], [0, 0]], [[2, 0], [2, 1]]] than the proposed bellow, if each array is of thousands elements?

Yerbol Sapar
  • 31
  • 1
  • 8

2 Answers2

1

We can use a dict to store values in the first 2d array, then go through the second 2d array to see if the dict contains the key.

x = [[45, 67], [32, 52], [94, 64], [21, 90]]
y = [[67, 103, 12], [2, 61, 77], [70, 94, 18]]

cor_indices = []

m1 = len(x)
n1 = len(x[0])

x_values = dict()

for i in range(m1):
    for j in range(n1):
        x_values[x[i][j]] = [i, j]

m2 = len(y)
n2 = len(y[0])
for i in range(m2):
    for j in range(n2):
        if y[i][j] in x_values:
            cor_indices.append([x_values[y[i][j]], [i, j]])

print(cor_indices)

Not a very pythonic way, but I think it's more efficient, with the time complexity of O(m1 * n1 + m2 * n2) and space complexity as O(m1 * n1).

Haoliang
  • 1,184
  • 4
  • 11
0

1st, get the mask of shape x that shows common values in both x&y(y then is flattened, as described at numpy.isin) for unique values:

a = np.isin(x, y, assume_unique=True)
a
array([[False,  True],
       [False, False],
       [ True, False],
       [False, False]])

2nd, apply np.argwhere to the mask with term > 0, it returns the indices of True in the mask, that is, address of common values 67 & 94 inside array x:

np.argwhere(a > 0)    
array([[0, 1],
       [2, 0]]) 

3rd, points 1&2 above applied to array y returns address of the same common values 67 & 94, but inside array y:

b = np.isin(y, x, assume_unique=True) 
np.argwhere(b > 0)    
array([[0, 0],
       [2, 1]])

4th, use np.stack((np.argwhere(a > 0), np.argwhere(b > 0)), axis=1) for convenient reading:

array([[[0, 1],
        [0, 0]],
       [[2, 0],
        [2, 1]]])

which means, the 1st common element 67 is in x at index [0, 1] and in y at [0, 0]; the second 94 in x: [2, 0], in y: [2, 1].

5th, to see the common values in both arrays use numpy 'fancy index', converting x&y to numpy array beforehand:

xi = np.array(x)[a]    
xi
array([67, 94])    
yi = np.array(y)[b]
yi
array([67, 94])

Here is might be a problem, if the order of common values is not the same. E.g., in case y = [[94, 103, 12], [2, 61, 77], [70, 67, 18]], np.array(y)[np.isin(y, x, assume_unique=True)] will give: yi = array([94, 67]) vs. xi = array([67, 94]). The use of np.stack((a, b), axis=1) makes sense only for mutually ordered indices of common values. Therefore, after the point 3 of the solution, we must do 5. (i.e., get the flat array of common values per list), and, by argsort() get the sorted index array in xi&yi. For the new y and old x that index arrays look like:

xi, yi = np.argsort(xi), np.argsort(yi)
yi
array([1, 0])
xi
array([0, 1])

And now, it is OK to use np.stack with 'fancy index':

np.stack((np.argwhere(a > 0)[xi], np.argwhere(b > 0)[yi]), axis=1)
array([[[0, 1],
        [2, 1]],
       [[2, 0],
        [0, 0]]])

If put together, the final proposed solution is:

def indx_correspnd(x, y):
    a = np.isin(x, y, assume_unique=True) 
    b = np.isin(y, x, assume_unique=True)
    xi = np.array(x)[a]
    yi = np.array(y)[b]
    xi, yi = np.argsort(xi), np.argsort(yi)
    return np.stack((np.argwhere(a > 0)[xi], np.argwhere(b > 0)[yi]), axis=1)

Use case1:

import numpy as np

x = [[45, 67], [32, 52], [94, 64], [21, 90]]
y = [[94, 103, 12], [2, 61, 77], [70, 67, 18]]

indx_correspnd(x, y)
array([[[0, 1],
        [2, 1]],
       [[2, 0],
        [0, 0]]])

Use case2, application to 2x2d lists: 4000 elements placed in 80 sublists by 50 & 4200 elements placed in 105 sublists by 40:

f=random.sample(range(1, 5000), 4000)
g=random.sample(range(1, 5000), 4200)
f=np.array(f).reshape(-1, 50)
g=np.array(g).reshape(-1, 40)

indx_correspnd(g, f)
array([[[52, 43],
    [11,  2]],
   [[38, 17],
    [29, 31]],
   [[74, 27],
    [45,  8]],
   ...,
   [[66, 38],
    [47,  7]],
   [[ 8,  3],
    [11,  6]],
   [[20, 39],
    [47, 26]]])
Yerbol Sapar
  • 31
  • 1
  • 8
  • Please include an explanation with your answer to help readers understand how this works, and solves the problem. You can click the edit button at the bottom of your answer to add an explanation. Additionally, you may find it beneficial reading [how to answer](https://stackoverflow.com/help/how-to-answer) – Freddy Mcloughlan Jun 23 '22 at 23:09