Compare two matrices and create a matrix of their common values

Question

I'm currently trying to compare two matrices and return matching rows into the "intersection matrix" via python. Both matrices are numerical data-and I'm trying to return the rows of their common entries (I have also tried just creating a matrix with matching positional entries along the first column and then creating an accompanying tuple). These matrices are not necessarily the same in dimensionality.

Let's say I have two matrices of matching column length but arbitrary (can be very large and different row length)

23 3 4 5       23 3 4 5
12 6 7 8       45 7 8 9
45 7 8 9       34 5 6 7
67 4 5 6       3 5 6 7

I'd like to create a matrix with the "intersection" being for this low dimensional example

23 3 4 5
45 7 8 9

perhaps it looks like this though:

1 2 3 4  2 4 6 7
2 4 6 7  4 10 6 9
4 6 7 8  5 6 7 8
5 6 7 8

in which case we only want:

2 4 6 7
5 6 7 8

I've tried things of this nature:

def compare(x):
#    This is a matrix I created with another function-purely numerical data of arbitrary size with fixed column length D

     y =n_c(data_cleaner(x))
#    this is a second matrix that i'd like to compare it to.  note that the sizes are probably not the same, but the columns length are
     z=data_cleaner(x)
#    I initialized an array that would hold the matching values 
     compare=[]
#    create nested for loop that will check a single index in one matrix over all entries in the second matrix over iteration
     for i in range(len(y)):
        for j in range(len(z)):
            if y[0][i] == z[0][i]:
#            I want the row or the n tuple (shown here) of those columns  with the matching first indexes as shown above    
             c_vec = ([0][i],[15][i],[24][i],[0][25],[0][26])
                compare.append(c_vec)
            else:
                pass
    return compare 

compare(c_i_w)

Sadly, I'm running into some errors. Specifically it seems that I'm telling python to improperly reference values.

not particularly. for really stingy book keeping sake having the indexes in the first position being sorted by increasing number is great. But as long as the array is consistent I'll be throwing it into a sckikit regression and cluster package sooo it kinda doesn't matter :D — user7351362, Apr 04 '17 at 18:16

piRSquared · Accepted Answer · 2017-04-04T18:21:32.790

2

Consider the arrays a and b

a = np.array([
        [23, 3, 4, 5],
        [12, 6, 7, 8],
        [45, 7, 8, 9],
        [67, 4, 5, 6]
    ])

b = np.array([
        [23, 3, 4, 5],
        [45, 7, 8, 9],
        [34, 5, 6, 7],
        [ 3, 5, 6, 7]
    ])

print(a)

[[23  3  4  5]
 [12  6  7  8]
 [45  7  8  9]
 [67  4  5  6]]

print(b)

[[23  3  4  5]
 [45  7  8  9]
 [34  5  6  7]
 [ 3  5  6  7]]

Then we can broadcast and get an array of equal rows with

x = (a[:, None] == b).all(-1)
print(x)

[[ True False False False]
 [False False False False]
 [False  True False False]
 [False False False False]]

Using np.where we can identify the indices

i, j = np.where(x)

Show which rows of a

print(a[i])

[[23  3  4  5]
 [45  7  8  9]]

And which rows of b

print(b[j])

[[23  3  4  5]
 [45  7  8  9]]

They are the same! That's good. That's what we wanted.

We can put the results into a pandas dataframe with a MultiIndex with row number from a in the first level and row number from b in the second level.

pd.DataFrame(a[i], [i, j])

      0  1  2  3
0 0  23  3  4  5
2 1  45  7  8  9

edited Apr 04 '17 at 18:21

answered Apr 04 '17 at 18:17

piRSquared

285,575
57
475
624

1

Or going all the way with booleans : `b[((a[:, None] == b).all(-1)).any(0)]`. – Divakar Apr 04 '17 at 18:20
is this method valid if the column dimensionalality of a and b is not equal? still a really cool way to do this. thanks for the reply! – user7351362 Apr 04 '17 at 18:21
@user7351362 yes! This works if they have different number of rows. Not if they have different number of columns. – piRSquared Apr 04 '17 at 18:23
oh that's still beautiful. i got really excited and had to ask another question. i appreciate it! – user7351362 Apr 04 '17 at 18:25

Compare two matrices and create a matrix of their common values

1 Answers1