0

Aim is to group (cluster) objects into non-intersecting sets (clusters) based on the multiple properties. However not by a combination of the properties. Here is the algorithm explained in the human language.

  • There is an array of objects.
  • There are two special properties in the objects by which all the objects should be grouped/clustered.
  • First, group the objects by the first property p1.
  • Thus we have divided the original array into several non-intersecting clusters.
  • If any object’s p2 property is matching any object from the other cluster, the two clusters should be merged into one.

Here is an example:

const items = [
      { a: 'file1', b: 'D1', s: 'S0' },
      { a: 'file2', b: 'D1', s: 'S1' },
      { a: 'file3', b: 'D1', s: 'S2' },
      { a: 'file4', b: 'D1', s: 'S2' },
      { a: 'file5', b: 'D2', s: 'S1' },
      { a: 'file6', b: 'D2', s: 'S5' },
      { a: 'file7', b: 'D3', s: 'S6' },
      { a: 'file8', b: 'D3', s: 'S7' },
    ];

Here, first property is b and the second property is s.

  • Grouping by b gives us clusters file1-file4, file5-file6, file7-file8.
  • file2 has s=S1 and so file5 does. They are in different clusters, so the clusters should be merged together. Now we have unified big cluster file1-file6.
  • Cluster file7-file8 does not intersect any other cluster in property s, so it stays isolated.

Here is the graphic representation of the main specialty of the algorithm:

Algoritm

Q1: Is there any well-known algorithm for such a task/problem?

Q2: What approach would be suggested to implement the described algorithm?

The programming language of the project is Node.js, however answer in any programming language is welcome.

AKd
  • 501
  • 4
  • 17
  • Seems similar or even duplicate question: https://stackoverflow.com/questions/967064/union-of-all-intersecting-sets – AKd May 01 '20 at 03:56
  • Found out that the task is called “Finding a partition of the set based on the described equivalence relation”: https://en.wikipedia.org/wiki/Partition_of_a_set – AKd May 01 '20 at 05:46

1 Answers1

0

The problem seems ad-hoc, and may not have a standard algorithm, even though the disjoint-set data structure (union-find algorithm) seems useful.

First of all, the number of clusters k is <= unique values of b properties.

So, we can create our clusters with the size k. We need to assign elements from items to these k clusters (some clusters may be empty).

First, we can run a linear scan on b features to assign each of them to a cluster. O(n)

Now, for each cluster, we calculate SF = set(s features).

For example, for cluster 1: (file1-4) will have SF set(s features) = {S0, S1, S2} as these are the unique s parameters in that cluster.

We then search, for each element in each cluster if the s parameter of that element matches with any element from any other clusters' SF, then we merge them, make both of their parents the same (union-find). This step will have complexity O(n . k . log(K)).

Zabir Al Nazi
  • 10,298
  • 4
  • 33
  • 60