1

I'm trying to find a union of two 2d arrays based on the first column:

>>> x1
array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])
>>> x2
array([[ 7, -1, -1],
       [10, 11, 12]])

If two rows have a matching first value, I want the one from x2. I.e. the union of the first column of x1[:, 0] and x2[:, 0] is [1, 4, 7, 10] and I want the row [7, -1, -1] from x2, not [7, 8, 9] from x1. The expected result in this case is:

>>> res
array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7, -1, -1],
       [10, 11, 12]])

I see there is a possible solution for union of a 2D array here, where I get the result:

>>> res
array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7, -1, -1],
       [ 7,  8,  9],
       [10, 11, 12]]) 

In this result, I wanted the row [7, 8, 9] from x1 to be excluded. How could I do that?

Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
newkid
  • 1,368
  • 1
  • 11
  • 27
  • 4
    What does `where the row from x2 is preferred` mean? – yatu Sep 01 '20 at 14:31
  • 1
    Having all 5 rows is the union of both arrays. Why would you take [7, -1, -1] over [7, 8, 9]? Are you saying that you want the first number of each row to be unique? If so, why not just iterate over them, deleting any row that contains a number in row[0] that you've already encountered. – fooiey Sep 01 '20 at 14:35
  • I think it should also be ok to delete intersections of row 0 in x1 and row_stack x1 and x2 – newkid Sep 01 '20 at 14:37
  • What if rows are duplicated within the matrix? – Mad Physicist Sep 01 '20 at 15:21

1 Answers1

1

You can use np.unique and np.concatenate, placing x2 first. Unique can compute the index of your values, based on the first occurrence:

values = np.concatenate((x2[:, 0], x1[:, 0]))
_, index = np.unique(values, return_index=True)
mask = index >= x2.shape[0]
result = np.concatenate((x1[index[mask] - x2.shape[0], :], x2[index[~mask], :]), axis=0)

The result is exactly the array you would expect:

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7, -1, -1],
       [10, 11, 12]])

Keep in mind that the result of unique is sorted, which coincidentally happens to correspond to your original order. You can get the original order with a clever application of return_inverse=True, which will be left as an exercise for the reader.

Mad Physicist
  • 107,652
  • 25
  • 181
  • 264