How to group all labels (index) which shares at least one "1" in the same column?

Question

Grouping Rules:

has at least one "1" in the same column
shares any number of rows in common (see example)

For example:

   c0  c1  c2  c3
A   1   0   0   1
B   0   0   1   0
C   0   0   0   1
D   0   1   1   0
E   0   1   0   0

Expected output:

[[A, C], [B, D, E]]

As you can see B and E do not share "1" in columns, but they have "D" in common, therefore all 3 should be grouped

I suggest you to add more explanation about your rules of rows grouping in your question. — omri_saadon, Sep 13 '17 at 15:24
You'll need some network analysis to find all the distinct groups — Ted Petrou, Sep 13 '17 at 15:32
There is a solution using `networkx` to make things a bit simple. — Zero, Sep 13 '17 at 15:33
@omri_saadon, I've added more explanation about rules, tnx for suggestion — wik, Sep 13 '17 at 15:42
After using @TedPetrou's deduction, if you don't want to use `networkx` see [Merge lists that share common elements](https://stackoverflow.com/questions/4842613/merge-lists-that-share-common-elements?noredirect=1&lq=1) for alternate methods — Zero, Sep 13 '17 at 16:12

Ted Petrou · Accepted Answer · 2017-09-13T16:01:23.350

5

Here is a solution with networkx.

import networkx as nx
a = np.where(df.T, df.index, '').sum(axis=1)
g = [list(x) for x in a if len(x) > 1]
G = nx.Graph(g)
list(nx.connected_components(G))

[{'B', 'D', 'E'}, {'A', 'C'}]

edited Sep 13 '17 at 16:01

answered Sep 13 '17 at 15:48

Ted Petrou

59,042
19
131
136

I was on `[df.index[df[c].astype(bool)].tolist() for c in df.columns]` which might be slower than this. – Zero Sep 13 '17 at 15:56
Thanks very much guys! – Ted Petrou Sep 13 '17 at 16:08
After using @TedPetrou's deduction, if you don't want to use `networkx` see [Merge lists that share common elements](https://stackoverflow.com/questions/4842613/merge-lists-that-share-common-elements?noredirect=1&lq=1) for alternate methods – Zero Sep 13 '17 at 16:12
This is nice +1. – Mohamed Ali JAMAOUI Sep 13 '17 at 19:00

Mohamed Ali JAMAOUI · Answer 2 · 2017-09-13T16:03:01.800

This can achieve what you want:

import numpy as np
from itertools import combinations 
import networkx as nx

df
"""output:  
   1  2  3  4
0            
A  1  0  0  1
B  0  0  1  0
C  0  0  0  1
D  0  1  1  0
E  0  1  0  0
"""

df.index.tolist()
"""output:
['A', 'B', 'C', 'D', 'E']
"""
list(combinations(df.index.tolist(),2))

"""output : 
[('A', 'B'),
 ('A', 'C'),
 ('A', 'D'),
 ('A', 'E'),
 ('B', 'C'),
 ('B', 'D'),
 ('B', 'E'),
 ('C', 'D'),
 ('C', 'E'),
 ('D', 'E')]
"""
results = [x for x in list(combinations(df.index.tolist(),2)) if np.sum(df.loc[x[0],:].multiply(df.loc[x[1],:])) > 0]

results
"""output: 
[('A', 'C'), ('B', 'D'), ('D', 'E')]
"""
list(nx.connected_components(nx.Graph(results)))
"""output: 
[{'A', 'C'}, {'B', 'D', 'E'}]
"""

How to group all labels (index) which shares at least one "1" in the same column?

2 Answers2

Linked