python 2-D array get the function as np.unique or union1d

Question

as follows I have a 2-D list/array

list1 = [[1,2],[3,4]]
list2 = [[3,4],[5,6]]

how can I use the function as union1d(x,y)to make list1 and list2 as one list

list3 = [[1,2],[3,4],[5,6]]

The answers here http://stackoverflow.com/questions/16970982/find-unique-rows-in-numpy-array address both the simple case (where “unique” sub-lists means “bit-exact”) and the floating-point case (where you want to treat two sub-lists as “equal” if they’re within some tolerance of each other). Does this answer your question? — Ahmed Fasih, Aug 22 '16 at 16:10
@Ahmed Fasih ths,and to get the unique 2-D list,the answer [here](http://stackoverflow.com/questions/39081807/python-2-d-list-how-to-make-a-set) has solved my unique problem,this problem I main want find a function that can merge two lists samply — Yi Zhang, Aug 22 '16 at 16:34
For small lists, the Python list methods in your other SO question are gong to be faster than these `numpy` ones. It takes time to convert a list into an array. — hpaulj, Aug 22 '16 at 20:28
A list version based on the other SO question: `[list(x) for x in {tuple(x) for x in list1+list2}]` — hpaulj, Aug 22 '16 at 20:38

score 1 · Answer 1 · edited May 23 '17 at 10:33

union1d just does:

unique(np.concatenate((ar1, ar2)))

so if you have a method of finding unique rows, you have the solution.

As described in the suggested link, and elsewhere, you can do this by converting the array to a 1d structured array. Here the simple version is

If arr is:

arr=np.array([[1,2],[3,4],[3,4],[5,6]])

the structured equivalent (a view, same data):

In [4]: arr.view('i,i')
Out[4]: 
array([[(1, 2)],
       [(3, 4)],
       [(3, 4)],
       [(5, 6)]], 
      dtype=[('f0', '<i4'), ('f1', '<i4')])

In [5]: np.unique(arr.view('i,i'))
Out[5]: 
array([(1, 2), (3, 4), (5, 6)], 
      dtype=[('f0', '<i4'), ('f1', '<i4')])

and back to 2d int:

In [7]: np.unique(arr.view('i,i')).view('2int')
Out[7]: 
array([[1, 2],
       [3, 4],
       [5, 6]])

This solution does require a certain familiarity with compound dtypes.

Using return_index saves that return view. We can index arr directly with that index:

In [54]: idx=np.unique(arr.view('i,i'),return_index=True)[1]

In [55]: arr[idx,:]
Out[55]: 
array([[1, 2],
       [3, 4],
       [5, 6]])

For what it's worth, unique does a sort and then uses a mask approach to remove adjacent duplicates.

It's the sort that requires a 1d array, the rest works in 2d

Here arr is already sorted

In [42]: flag=np.concatenate([[True],(arr[1:,:]!=arr[:-1,:]).all(axis=1)])

In [43]: flag
Out[43]: array([ True,  True, False,  True], dtype=bool)

In [44]: arr[flag,:]
Out[44]: 
array([[1, 2],
       [3, 4],
       [5, 6]])

https://stackoverflow.com/a/16971324/901925 shows this working with lexsort.

================

The mention of np.union1d set me and Divakar to focus on numpy methods. But it starting with lists (of lists), it is likely to be faster to use Python set methods.

For example, using list and set comprehensions:

In [99]: [list(x) for x in {tuple(x) for x in list1+list2}]
Out[99]: [[1, 2], [3, 4], [5, 6]]

You could also take the set for each list, and do a set union.

The tuple conversion is needed because a list isn't hashable.

All your examples use `unique`, not `union1d`. Is this deliberate? — Eric, Aug 22 '16 at 19:29
Just easier to deal with one array after concatenation. `union1d` starts with a concatenation. Divakar tries to make the case for performing the union without first concatenating. — hpaulj, Aug 22 '16 at 20:14

score 0 · Answer 2 · edited May 23 '17 at 12:10

One approach would be to stack those two input arrays vertically with np.vstack and then finding the unique rows in it. It would be memory intensive as we would discard rows from it thereafter.

Another approach would be to find the rows in the first array that are exclusive to it, i.e. not present in the second array and thus just stacking those exclusive rows alongwith the second array. Of course, this would assume that there are unique rows among each input array.

The crux of such a proposed memory-saving implementation would be to get those exclusive rows from first array. For the same we would convert each row into a linear index equivalent considering each row as an indexing tuple on a n-dimensional grid, with the n being the number of columns in the input arrays. Thus, assuming the input arrays as arr1 and arr2, we would have an implementation like so -

# Get dim of ndim-grid on which linear index equivalents are to be mapped
dims = np.maximum(arr1.max(0),arr2.max(0)) + 1

# Get linear index equivalents for arr1, arr2
idx1 = np.ravel_multi_index(arr1.T,dims)
idx2 = np.ravel_multi_index(arr2.T,dims)

# Finally get the exclusive rows and stack with arr2 for desired o/p
out = np.vstack((arr1[~np.in1d(idx1,idx2)],arr2))

Sample run -

In [93]: arr1
Out[93]: 
array([[1, 2],
       [3, 4],
       [5, 3]])

In [94]: arr2
Out[94]: 
array([[3, 4],
       [5, 6]])

In [95]: out
Out[95]: 
array([[1, 2],
       [5, 3],
       [3, 4],
       [5, 6]])

For more info on setting up those linear index equivalents, please refer to this post.

`in1d` does `unique` on the 2 inputs separately, and then a `unique` like operation on their concatenation. — hpaulj, Aug 22 '16 at 20:24

python 2-D array get the function as np.unique or union1d

2 Answers2

Linked