Alternative solution in one query:
with
-- sample data
t(Id, Name, Dataset) as (
select 1, 'ABC1', 'A:B;C:D;E:F' from dual union all
select 2, 'ABC2', 'A:B;C:D;R:S' from dual union all
select 3, 'ABC3', 'C:4;G:5;A:B' from dual union all
select 4, 'ABC4', 'A:B;C:D;E:F' from dual ),
-- end of sample data
q as (
select distinct id, name,
trim(regexp_substr(t.dataset, '[^;]+', 1, ls.column_value)) as ds
from t, table(cast(multiset(select level from dual
connect by level <= length(regexp_replace(t.dataset, '[^;]+'))+1)
as sys.odcinumberlist)) ls),
p as (select q1.id id1, q1.name name1, q2.id id2, q2.name name2, q1.ds set1, q2.ds set2,
max(case when q1.ds = q2.ds then 1 else 0 end)
over (partition by q1.id, q2.id, q1.ds) m1,
max(case when q1.ds = q2.ds then 1 else 0 end)
over (partition by q1.id, q2.id, q2.ds) m2
from q q1 join q q2 on q1.id <> q2.id),
a1 as (select distinct id1, id2, set1 ds from p where m1 = 0),
a2 as (select distinct id1, id2, set1 ds from p where m1 = 1),
a3 as (select distinct id1, id2, set2 ds from p where m2 = 0)
select t1.id id1, t1.name name1, t2.id id2, t2.name name2,
(select listagg(ds, ' ; ') within group (order by ds)
from a1 where id1 = t1.id and id2 = t2.id) l1,
(select listagg(ds, ' ; ') within group (order by ds)
from a2 where id1 = t1.id and id2 = t2.id) l2,
(select listagg(ds, ' ; ') within group (order by ds)
from a3 where id1 = t1.id and id2 = t2.id) l3
from t t1
join t t2 on t1.id <> t2.id;
Result:
ID1 NAME1 ID2 NAME2 L1 L2 L3
------ ----- ------ ----- ------------ ---------------- -------------
1 ABC1 2 ABC2 E:F A:B ; C:D R:S
1 ABC1 3 ABC3 C:D ; E:F A:B C:4 ; G:5
1 ABC1 4 ABC4 A:B ; C:D ; E:F
2 ABC2 1 ABC1 R:S A:B ; C:D E:F
2 ABC2 3 ABC3 C:D ; R:S A:B C:4 ; G:5
2 ABC2 4 ABC4 R:S A:B ; C:D E:F
3 ABC3 1 ABC1 C:4 ; G:5 A:B C:D ; E:F
3 ABC3 2 ABC2 C:4 ; G:5 A:B C:D ; R:S
3 ABC3 4 ABC4 C:4 ; G:5 A:B C:D ; E:F
4 ABC4 1 ABC1 A:B ; C:D ; E:F
4 ABC4 2 ABC2 E:F A:B ; C:D R:S
4 ABC4 3 ABC3 C:D ; E:F A:B C:4 ; G:5
12 rows selected
Subbquery q
uses one of the splitting techniques from SO to divide words into separate rows. Then I self-joined data and counted matching / non-matching words. Subqueries a1
- a3
are needed only because function listagg
does not respect distinct
clause.
This solution compares (1 and 4) and (4 and 1). You can change this to show results only once by replacing <>
in t1.id <> t2.id
and q1.id <> q2.id
with <
.