-2

Consider the following table:

ID  Feature
1   1
1   2
1   3
2   3
2   4
2   6
3   5
3   10
3   12
4   12
4   18
5   10
5   30

I would like to group the individuals based on overlapping features. If two of these groups again have overlapping features, I would consider both as one group. This process should be repeated until there are no overlapping features between groups. The result of this procedure on the table above would be:

ID  Feature Flag
1   1       A
1   2       A
1   3       A
2   3       A
2   4       A
2   6       A
3   5       B
3   10      B
3   12      B
4   12      B
4   18      B
5   10      B
5   30      B

So actually the problem I am trying to solve is finding connected components in a graph. Here [1,2,3] is the graph with ID 1 (see https://en.wikipedia.org/wiki/Connectivity_(graph_theory)). The problem is equivalent to this problem, however I would like to solve it with Oracle SQL.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
K. Roelofs
  • 103
  • 4
  • 1
    What version of Oracle are you using? – Gordon Linoff Oct 03 '18 at 14:42
  • I am using version 12c – K. Roelofs Oct 03 '18 at 14:46
  • if this is the connection 1:1,1:2,2:3,3:5 etc I cant see why 3:5 should be in group B and the other in group A? – W_O_L_F Oct 03 '18 at 15:01
  • ID 3 and 4 have common feature 12, lets call ID 3 and 4 group x. ID 3 and 5 have common feature 10, lets call ID 3 and 5 group y. Now group x and y have common feature 12. Therefore 3,4 and 5 belong to one group. – K. Roelofs Oct 03 '18 at 15:10
  • @K.Roelofs . . . Then there is hope of solving the problem using recursive CTEs. – Gordon Linoff Oct 03 '18 at 15:34
  • I see the question has two downvotes already. If the downvoters stop by again (not very likely, alas), perhaps they would like to explain what's wrong with the question. The downvotes suggest they didn't understand the question... which refers poorly on them, not on the person who posted a perfectly valid question. –  Oct 03 '18 at 19:57

1 Answers1

0

Here is one way to do this, using a hierarchical ("connect by") query. The first step is to extract the initial relationships from the base data; the hierarchical query is built on the result from this first step. I added one more row to the inputs to illustrate a node that is a connected component by itself.

You marked the connected components as A and B - of course, that won't work if you have, say, 30,000 connected components. In my solution, I use the minimum node name as the marker for each connected component.

with
  sample_data (id, feature) as (
    select 1,  1 from dual union all
    select 1,  2 from dual union all
    select 1,  3 from dual union all
    select 2,  3 from dual union all
    select 2,  4 from dual union all
    select 2,  6 from dual union all
    select 3,  5 from dual union all
    select 3, 10 from dual union all
    select 3, 12 from dual union all
    select 4, 12 from dual union all
    select 4, 18 from dual union all
    select 5, 10 from dual union all
    select 5, 30 from dual union all
    select 6, 40 from dual
  )
-- select * from sample_data; /*
, initial_rel(id_base, id_linked) as (
    select distinct s1.id, s2.id
      from sample_data s1 join sample_data s2
                          on s1.feature = s2.feature and s1.id <= s2.id
  )
-- select * from initial_rel; /*
select     id_linked as id, min(connect_by_root(id_base)) as id_group
from       initial_rel
start with id_base <= id_linked
connect by nocycle prior id_linked = id_base and id_base < id_linked
group by   id_linked
order by   id_group, id
;

Output:

     ID   ID_GROUP
------- ----------
      1          1
      2          1
      3          3
      4          3
      5          3
      6          6

Then, if you need to add the ID_GROUP as a FLAG to the base data, you can do so with a trivial join.