Python/Numpy: Grouping array-rows by a common element

Question

Suppose I have an array like this:

[[1,2,3]
 [0,4,2]
 [4,2,5] 
 [6,1,1]
 [1,3,5]
 [3,0,1]
 [0,4,2]]

I want to categorize the rows of the array by letting any row that have an element in common with some other row belong to the same category. In general, the arrays may not only consist of integers but could be any float. It is a requirement that the elements must agree in the same position. For the above array, the categories would be

[[0],
 [1],
 [0],
 [2],
 [0],
 [2],
 [1]]

Note: Every member in each category should share a common number at common position with AT LEAST ONE other member in the same category. Not all pairs of members in the same category need to share common number at common position.

Can you think of a solid way to do this?

It is not a vectorizable operation. Use `networkit.components`. — Marat, May 12 '20 at 23:58
3 rows in same category can share different position commonality? or should every pair in each category have commonality?(a.k.a transitivity applies to members of a category?) — Ehsan, May 13 '20 at 00:23
@Marat It seems to be vectorizable. While your suggestion is great (and I personally prefer that), please reject solutions if you are absolutely certain. In fact, there are vectorized solutions to components too. Thank you. — Ehsan, May 13 '20 at 00:26
@Ehsan Each pair in a category should have at least one common element in a common position. I.e the following belongs to the same category: (0,42,7), (0,61,31), (13,61,11),(13,59,11). Here the first two share a zero in the same position, the second and third has 61 in a common position, the third and fourth share both 13 and 11 in the same positions. Looking forward to your answer! — Mikkel Rev, May 13 '20 at 00:35
@MikkelRev your statement is in contradiction with your example: _Each pair in a category should have at least one common element_ but `(0,42,7)` and `(13,59,11)` do not share common element. So I guess you mean every element in each category should share a common number with AT LEAST one other element in the same category, correct? — Ehsan, May 13 '20 at 00:39
@Ehsan Yes, actually you are right: every element in each category should share a common number (in the same position) with at least one other element in the same category — Mikkel Rev, May 13 '20 at 00:45
@MikkelRev Thanks for clarifying, in that case please find my answer. I also added that as a note to your question for the reader. — Ehsan, May 13 '20 at 00:51

Ehsan · Accepted Answer · 2020-05-14T23:49:03.550

1

This gives you the common pairs of rows. The rest depend on your answer to my comment which I will update once I understand question correctly:

pairs = np.argwhere(((a[:,None]-a)==0).any(axis=2))

UPDATE: according to comments definition of category, this will return categories:

b = np.arange(a.shape[0])
for pair in pairs:
  b[np.flatnonzero(b==b[pair[1]])] = b[pair[0]]
b = b - b.min()

You can probably make this faster by previously removing self-edges and duplicate edges (there is two of each edge) from pairs prior to the for loop.

output:

[0 2 0 1 0 1 2]

category mapping name is different than output in question, but categories are the same. If you wish to name them differently, simply change last line of code b = b - b.min() to your desired naming.

Another approach is to use edgelist pairs to create a graph and extract the connected components.

edited May 14 '20 at 23:49

answered May 13 '20 at 00:40

Ehsan

12,072
2
20
33

@MikkelRev numpy is powerful. Of course another method is to use edgelist `pairs` to create a graph and extract connected components. Not sure which one is faster for you input size. – Ehsan May 13 '20 at 01:52
I suspect that there is a problem. I think it must be in the second to last line. Because if I try a = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,28],[13,14,15,8],[1,18,19,20],[21,22,23,24],[25,18,27,28]]) : Clearly row number 0 and 4 are in the same category since they share a 1 in the first position, but the code prints [1, 0, 3, 0, 3, 2, 3]. Can you see an easy fix? – Mikkel Rev May 14 '20 at 21:30
@MikkelRev aah, seems there is. `pairs` is fine. I will fix it. Thank you for catching it. – Ehsan May 14 '20 at 21:39
1

@MikkelRev I fixed it. It is probably not the fastest. Might be a faster way to do it. but it works for now. Let me know if you need even faster way and I can try to push it. – Ehsan May 14 '20 at 23:38

Python/Numpy: Grouping array-rows by a common element

1 Answers1

Linked