Find unique columns and column membership

Question

I went through these threads:

and they all discuss several methods for computing the matrix with unique rows and columns.

However, the solutions look a bit convoluted, at least to the untrained eye. Here is for example top solution from the first thread, which (correct me if I am wrong) I believe it is the safest and fastest:

np.unique(a.view(np.dtype((np.void, a.dtype.itemsize*a.shape[1])))).view(a.dtype).reshape(-1, 
a.shape[1])

Either way, the above solution only returns the matrix of unique rows. What I am looking for is something along the original functionality of np.unique

u, indices = np.unique(a, return_inverse=True)

which returns, not only the list of unique entries, but also the membership of each item to each unique entry found, but how can I do this for columns?

Here is an example of what I am looking for:

array([[0, 2, 0, 2, 2, 0, 2, 1, 1, 2],
       [0, 1, 0, 1, 1, 1, 2, 2, 2, 2]])

We would have:

u       = array([0,1,2,3,4])
indices = array([0,1,0,1,1,3,4,4,3])

Where the different values in u represent the set of unique columns in the original array:

0 -> [0,0]
1 -> [2,1]
2 -> [0,1]
3 -> [2,2]
4 -> [1,2]

Daniel · Answer 1 · 2013-08-13T01:53:33.337

First lets get the unique indices, to do so we need to start by transposing your array:

>>> a=a.T

Using a modified version of the above to get unique indices.

>>> ua, uind = np.unique(np.ascontiguousarray(a).view(np.dtype((np.void,a.dtype.itemsize * a.shape[1]))),return_inverse=True)

>>> uind
array([0, 3, 0, 3, 3, 1, 4, 2, 2, 4])

#Thanks to @Jamie
>>> ua = ua.view(a.dtype).reshape(ua.shape + (-1,))
>>> ua
array([[0, 0],
       [0, 1],
       [1, 2],
       [2, 1],
       [2, 2]])

For sanity:

>>> np.all(a==ua[uind])
True

To reproduce your chart:

>>> for x in range(ua.shape[0]):
...     print x,'->',ua[x]
...
0 -> [0 0]
1 -> [0 1]
2 -> [1 2]
3 -> [2 1]
4 -> [2 2]

To do exactly what you ask, but will be a bit slower if it has to convert the array:

>>> b=np.asfortranarray(a).view(np.dtype((np.void,a.dtype.itemsize * a.shape[0])))
>>> ua,uind=np.unique(b,return_inverse=True)
>>> uind
array([0, 3, 0, 3, 3, 1, 4, 2, 2, 4])
>>> ua.view(a.dtype).reshape(ua.shape+(-1,),order='F')
array([[0, 0, 1, 2, 2],
       [0, 1, 2, 1, 2]])

#To return this in the previous order.
>>> ua.view(a.dtype).reshape(ua.shape + (-1,))

You already have the unique array in `ua`, all you need to do is `.view` it properly, no need to do any data copying. I think the best option is to do `ua = ua.view(a.dtype).reshape(ua.shape + (-1,))`. — Jaime, Aug 12 '13 at 22:58

score 2 · Accepted Answer · edited Aug 13 '13 at 14:19

Essentially, you want np.unique to return the indexes of the unique columns, and the indices of where they're used? This is easy enough to do by transposing the matrix and then using the code from the other question, with the addition of return_inverse=True.

at = a.T
b = np.ascontiguousarray(at).view(np.dtype((np.void, at.dtype.itemsize * at.shape[1])))
_, u, indices = np.unique(b, return_index=True, return_inverse=True)

With your a, this gives:

In [35]: u
Out[35]: array([0, 5, 7, 1, 6])

In [36]: indices
Out[36]: array([0, 3, 0, 3, 3, 1, 4, 2, 2, 4])

It's not entirely clear to me what you want u to be, however. If you want it to be the unique columns, then you could use the following instead:

at = a.T
b = np.ascontiguousarray(at).view(np.dtype((np.void, at.dtype.itemsize * at.shape[1])))
_, idx, indices = np.unique(b, return_index=True, return_inverse=True)
u = a[:,idx]

This would give

In [41]: u
Out[41]:
array([[0, 0, 1, 2, 2],
       [0, 1, 2, 1, 2]])

In [42]: indices
Out[42]: array([0, 3, 0, 3, 3, 1, 4, 2, 2, 4])

Thanks, this is exactly what I was looking for. One question, why use the `_` in `np.ascontiguousarray()` ? I noticed that if I try to capture that output in a variable instead, and then I try to print that variable in an IPython shell, I get garbage (like trying to `cat` a binary on the shell) — Amelio Vazquez-Reina, Aug 13 '13 at 14:20
Do you mean as output from np.unique? You *shouldn't* get that sort of output; at least in my case (iPython 1.0, numpy 1.7.1) I get a normal array representation with blank elements. The output from np.unique that's discarded is still in the void dtype; as Ophion notes, you could convert it back with (assigning _ instead to uu) `uu.view(a.dtype).reshape(uu.shape + (-1,))` — cge, Aug 13 '13 at 16:26

Eelco Hoogendoorn · Answer 3 · 2016-04-02T19:50:21.110

1

Not entirely sure what you are after, but have a look at the numpy_indexed package (disclaimer: I am its author); it is sure to make problems of this kind easier:

import numpy_indexed as npi
unique_columns = npi.unique(A, axis=1)
# or perhaps this is what you want?
unique_columns, indices = npi.group_by(A.T, np.arange(A.shape[1])))

edited Apr 02 '16 at 19:50

answered Apr 02 '16 at 14:40

Eelco Hoogendoorn

10,459
1
44
42

Find unique columns and column membership

3 Answers3

Linked

Related