Group/cluster objects by multiple properties/keys and merge intersecting groups

Question

Aim is to group (cluster) objects into non-intersecting sets (clusters) based on the multiple properties. However not by a combination of the properties. Here is the algorithm explained in the human language.

There is an array of objects.
There are two special properties in the objects by which all the objects should be grouped/clustered.
First, group the objects by the first property p1.
Thus we have divided the original array into several non-intersecting clusters.
If any object’s p2 property is matching any object from the other cluster, the two clusters should be merged into one.

Here is an example:

const items = [
      { a: 'file1', b: 'D1', s: 'S0' },
      { a: 'file2', b: 'D1', s: 'S1' },
      { a: 'file3', b: 'D1', s: 'S2' },
      { a: 'file4', b: 'D1', s: 'S2' },
      { a: 'file5', b: 'D2', s: 'S1' },
      { a: 'file6', b: 'D2', s: 'S5' },
      { a: 'file7', b: 'D3', s: 'S6' },
      { a: 'file8', b: 'D3', s: 'S7' },
    ];

Here, first property is b and the second property is s.

Grouping by b gives us clusters file1-file4, file5-file6, file7-file8.
file2 has s=S1 and so file5 does. They are in different clusters, so the clusters should be merged together. Now we have unified big cluster file1-file6.
Cluster file7-file8 does not intersect any other cluster in property s, so it stays isolated.

Here is the graphic representation of the main specialty of the algorithm:

Q1: Is there any well-known algorithm for such a task/problem?

Q2: What approach would be suggested to implement the described algorithm?

The programming language of the project is Node.js, however answer in any programming language is welcome.

Seems similar or even duplicate question: https://stackoverflow.com/questions/967064/union-of-all-intersecting-sets — AKd, May 01 '20 at 03:56
Found out that the task is called “Finding a partition of the set based on the described equivalence relation”: https://en.wikipedia.org/wiki/Partition_of_a_set — AKd, May 01 '20 at 05:46

Zabir Al Nazi · Answer 1 · 2020-05-01T04:08:03.223

The problem seems ad-hoc, and may not have a standard algorithm, even though the disjoint-set data structure (union-find algorithm) seems useful.

First of all, the number of clusters k is <= unique values of b properties.

So, we can create our clusters with the size k. We need to assign elements from items to these k clusters (some clusters may be empty).

First, we can run a linear scan on b features to assign each of them to a cluster. O(n)

Now, for each cluster, we calculate SF = set(s features).

For example, for cluster 1: (file1-4) will have SF set(s features) = {S0, S1, S2} as these are the unique s parameters in that cluster.

We then search, for each element in each cluster if the s parameter of that element matches with any element from any other clusters' SF, then we merge them, make both of their parents the same (union-find). This step will have complexity O(n . k . log(K)).

Group/cluster objects by multiple properties/keys and merge intersecting groups

1 Answers1