Find common elements in 2D numpy arrays

Question

If I have two (or more) 2D arrays, how can I get only common elements between the arrays given a row number. For example, I have arrays in the format:

time, position, velocity, acceleration

I want to get the two arrays to only have the same time elements, so row 0. I can use

np.intersect1d(array1[:, 0], array2[:, 0])

which gives all the common times, but I want to either extract all matching rows/columns from array1/2 or remove non common time elements. In the end array1 and array2 will have the exact same dimensions so I could go:

pos_difference = array1[:, 1] - array2[:, 1]

The arrays could be different sizes, so for example:

array1 = [[1, 100.0, 0.0, 0.0], [2, 110.0, 0.0, 0.0], [3, 120.0, 0.0, 0.0]]
array2 = [[1, 101.0, 0.0, 0.0], [3, 119, 0.0, 0.0]]

And I want to extract only common time elements so array1 and array2 will only contain when Time=1, and Time=3, since those are the common time elements. Then I can go:

pos_difference = array1[:, 1] - array2[:, 1]

and this will be the position differences between the two arrays at the same time:

# First row will be when time=1 and second row will be when time=3
pos_difference = [[0, -1, 0.0, 0.0], [0, 1, 0.0, 0.0]]

It would help if you include a minimal example `array1` and `array2` plus the expected result. I think I know what you need but I'm not sure because I can't compare it to your actual arrays and expectation. — MSeifert, May 30 '17 at 15:32
@MSeifert I gave a small example, hopefully it makes more sense — user2840470, May 30 '17 at 17:04
Thank you. That's clear! One further question though: Are the times in each array unique and sorted? — MSeifert, May 30 '17 at 17:27
Take a look at Pandas library, it looks like a good use case. — heltonbiker, May 30 '17 at 18:23

MSeifert · Accepted Answer · 2017-05-30T17:32:47.350

If you have these arrays:

import numpy as np
array1 = np.array([[1, 100.0, 0.0, 0.0], [2, 110.0, 0.0, 0.0], [3, 120.0, 0.0, 0.0]])
array2 = np.array([[1, 101.0, 0.0, 0.0], [3, 119, 0.0, 0.0]])

As you said you can use np.intersect1d to get the intersection, the only thing remaining is to index the arrays:

intersect = np.intersect1d(array1[:, 0], array2[:, 0])

array1_matches = array1[np.any(array1[:, 0] == intersect[:, None], axis=0)]
array2_matches = array2[np.any(array2[:, 0] == intersect[:, None], axis=0)]

And then you can subtract them:

>>> array1_matches - array2_matches
array([[ 0., -1.,  0.,  0.],
       [ 0.,  1.,  0.,  0.]])

This assumes that your times are unique and sorted. In case they are unsorted you could sort them before:

>>> array1 = array1[np.argsort(array1[:, 0])]
>>> array2 = array2[np.argsort(array2[:, 0])]

In case the times are not-unique I have no idea how you want to handle that, so I can't advise you there.

score 0 · Answer 2 · answered Jun 01 '17 at 06:44

You want to use numpy.in1d.

array1 = array1[np.in1d(array1[:,0], array2[:,0]), assume_unique=True]
array2 = array2[np.in1d(array2[:,0], array1[:,0]), assume_unique=True]

Or if you don't want to change your originals:

array3 = array1[np.in1d(array1[:,0], array2[:,0]), assume_unique=True]
array4 = array2[np.in1d(array2[:,0], array3[:,0]), assume_unique=True]

Notice in both cases I'm using the reduced array as the target of the second in1d to reduce search time. If you want to optimize even more you can wrap it in an if statement to assure the smaller array is the target of the first in1d.

Then just do array3-array4

def common_subtract(a1, a2, i = 0, unique = True):
    a1, a2 = np.array(a1), np.array(a2)
    if a1.shape[0] > a2.shape[0]:  
        a1 = a1[np.in1d(a1[:, i], a2[:, i], assume_unique = unique)]
        a2 = a2[np.in1d(a2[:, i], a1[:, i], assume_unique = unique)]
    else:
        a2 = a2[np.in1d(a2[:, i], a1[:, i], assume_unique = unique)]
        a1 = a1[np.in1d(a1[:, i], a2[:, i], assume_unique = unique)]
    return a1 - a2

score 0 · Answer 3 · answered Jun 30 '20 at 16:13

I found using intersect1d more clearer way to find common elements in 2D numpy array. In this case recent_books and coding_books have been defined.

start = time.time()
recent_coding_books = np.intersect1d([recent_books], [coding_books]) 
print(len(recent_coding_books))
print('Duration: {} seconds'.format(time.time() - start))

Find common elements in 2D numpy arrays

3 Answers3

Linked