1

I'm currently trying to compare two matrices and return matching rows into the "intersection matrix" via python. Both matrices are numerical data-and I'm trying to return the rows of their common entries (I have also tried just creating a matrix with matching positional entries along the first column and then creating an accompanying tuple). These matrices are not necessarily the same in dimensionality.

Let's say I have two matrices of matching column length but arbitrary (can be very large and different row length)

23 3 4 5       23 3 4 5
12 6 7 8       45 7 8 9
45 7 8 9       34 5 6 7
67 4 5 6       3 5 6 7

I'd like to create a matrix with the "intersection" being for this low dimensional example

23 3 4 5
45 7 8 9

perhaps it looks like this though:

1 2 3 4  2 4 6 7
2 4 6 7  4 10 6 9
4 6 7 8  5 6 7 8
5 6 7 8

in which case we only want:

2 4 6 7
5 6 7 8

I've tried things of this nature:

def compare(x):
#    This is a matrix I created with another function-purely numerical data of arbitrary size with fixed column length D

     y =n_c(data_cleaner(x))
#    this is a second matrix that i'd like to compare it to.  note that the sizes are probably not the same, but the columns length are
     z=data_cleaner(x)
#    I initialized an array that would hold the matching values 
     compare=[]
#    create nested for loop that will check a single index in one matrix over all entries in the second matrix over iteration
     for i in range(len(y)):
        for j in range(len(z)):
            if y[0][i] == z[0][i]:
#            I want the row or the n tuple (shown here) of those columns  with the matching first indexes as shown above    
             c_vec = ([0][i],[15][i],[24][i],[0][25],[0][26])
                compare.append(c_vec)
            else:
                pass
    return compare 

compare(c_i_w)

Sadly, I'm running into some errors. Specifically it seems that I'm telling python to improperly reference values.

Jason Aller
  • 3,541
  • 28
  • 38
  • 38
user7351362
  • 191
  • 2
  • 15

1 Answers1

2

Consider the arrays a and b

a = np.array([
        [23, 3, 4, 5],
        [12, 6, 7, 8],
        [45, 7, 8, 9],
        [67, 4, 5, 6]
    ])

b = np.array([
        [23, 3, 4, 5],
        [45, 7, 8, 9],
        [34, 5, 6, 7],
        [ 3, 5, 6, 7]
    ])

print(a)

[[23  3  4  5]
 [12  6  7  8]
 [45  7  8  9]
 [67  4  5  6]]

print(b)

[[23  3  4  5]
 [45  7  8  9]
 [34  5  6  7]
 [ 3  5  6  7]]

Then we can broadcast and get an array of equal rows with

x = (a[:, None] == b).all(-1)
print(x)

[[ True False False False]
 [False False False False]
 [False  True False False]
 [False False False False]]

Using np.where we can identify the indices

i, j = np.where(x)

Show which rows of a

print(a[i])

[[23  3  4  5]
 [45  7  8  9]]

And which rows of b

print(b[j])

[[23  3  4  5]
 [45  7  8  9]]

They are the same! That's good. That's what we wanted.

We can put the results into a pandas dataframe with a MultiIndex with row number from a in the first level and row number from b in the second level.

pd.DataFrame(a[i], [i, j])

      0  1  2  3
0 0  23  3  4  5
2 1  45  7  8  9
piRSquared
  • 285,575
  • 57
  • 475
  • 624