1

I am looking for a smart algorithm or pythonic approach to create clusters from pairs of data.

The input data is structured like this:

[
 (productA,ProductB),
 (productB,ProductC),
 (productC,ProductD),
 (productA,ProductD),
 (productD,ProductB),
 (productC,ProductA),

 (productE,ProductF),
 (productF,ProductG),
 (productG,ProductH),
 (productG,ProductE),
]

and it should be clustered to:

[
(productA,productB,productC,productD),
(productE,productF,productG,productH)
]

How can this be achieved? (The order of the products within the two clusters does not matter)

Any ideas are greatly appreciated!

Jabb
  • 3,414
  • 8
  • 35
  • 58
  • A *cluster* is the transitive closure over all products linked via some (directional) pair? Is the relation symmetric? – dhke Jan 26 '16 at 19:10

2 Answers2

3

Using networkx, you could build a graph and find the connected components:

import networkx as nx

data = [
 ('productA','productB'),
 ('productB','productC'),
 ('productC','productD'),
 ('productA','productD'),
 ('productD','productB'),
 ('productC','productA'),
 ('productE','productF'),
 ('productF','productG'),
 ('productG','productH'),
 ('productG','productE'),
]

G = nx.Graph()
G.add_edges_from(data)
for connected_component in nx.connected_components(G):
    print(connected_component)

yields

['productG', 'productF', 'productE', 'productH']
['productD', 'productC', 'productB', 'productA']
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • will try this instantly. Two things to note: 1. I love SO... always a solution to any problem, 2. I love python... – Jabb Jan 26 '16 at 19:17
0

What you are looking for is: Quick Union algorithm.

Raunak Agarwal
  • 7,117
  • 6
  • 38
  • 62
  • QU requires the relation to be transitive, reflexive and symmetric. The examples don't show if the relation is symmetric. – dhke Jan 26 '16 at 19:14