numpy unique over multiple arrays

Question

Numpy.unique expects a 1-D array. If the input is not a 1-D array, it flattens it by default.

Is there a way for it to accept multiple arrays? To keep it simple, let's just say a pair of arrays, and we are unique-ing the pair of elements across the 2 arrays.

For example, say I have 2 numpy array as inputs

a = [1,    2,    3,    3]
b = [10,   20,   30,  31]

I'm unique-ing against both of these arrays, so against these 4 pairs (1,10), (2,20) (3, 30), and (3,31). These 4 are all unique, so I want my result to say

[True, True, True, True]

If instead the inputs are as follows

a = [1,    2,    3,    3]
b = [10,   20,   30,  30]

Then the last 2 elements are not unique. So the output should be

[True, True, True, False]

Have you looked at the `axis` parameter? – hpaulj Dec 07 '20 at 20:19 — hpaulj, Dec 07 '20 at 20:19

Tonechas · Accepted Answer · 2020-12-08T03:07:36.367

1

You could use the unique_indices value returned by numpy.unique():

In [243]: def is_unique(*lsts):
     ...:     arr = np.vstack(lsts)
     ...:     _, ind = np.unique(arr, axis=1, return_index=True)
     ...:     out = np.zeros(shape=arr.shape[1], dtype=bool)
     ...:     out[ind] = True
     ...:     return out

In [244]: a = [1, 2, 2, 3, 3]

In [245]: b = [1, 2, 2, 3, 3]

In [246]: c = [1, 2, 0, 3, 3]

In [247]: is_unique(a, b)
Out[247]: array([ True,  True, False,  True, False])

In [248]: is_unique(a, b, c)
Out[248]: array([ True,  True,  True,  True, False])

You may also find this thread helpful.

edited Dec 08 '20 at 03:07

answered Dec 07 '20 at 21:53

Tonechas

13,398
16
46
80

Thank you. This makes sense. Follow up question - what is your recommendation if my input lists are different dtypes? e.g. a.dtype=int64, b.dtype=datetime[D]. `vstack` complains about invalid type promotion. What about turning the datetime[D] array into an array of hashes? – user3240688 Dec 08 '20 at 14:54
Creating an array of objects would we a possible way to go `np.array([a, b], dtype=object)` – Tonechas Dec 08 '20 at 15:25
but the axis argument to unique is not supported for dtype. – user3240688 Dec 08 '20 at 17:18
On second thought, I'd just convert the date array into string np array. `vstack` and `axis` works fine given that. That's a reasonable solution, right? – user3240688 Dec 08 '20 at 17:26
Yeah, it seems like a sensible workaround – Tonechas Dec 08 '20 at 17:29

numpy unique over multiple arrays

1 Answers1