0

I have a table of 2 columns like below. One ID can have multiple Code.

╔═════════╗
║ ID Code ║
╠═════════╣
║ 1  0    ║
║ 1  16   ║
║ 1  41   ║
║ 2  1    ║
║ 2  40   ║
║ 3  41   ║
║ 3  67   ║
║ 4  70   ║
║ 5  16   ║
║ 5  67   ║
║ 6  41   ║
║ 6  68   ║
╚═════════╝

My desired outcome is identifying each ID belongs to which Group, by checking all Code of that ID. The checking rule is:

╔══════════════════════════════════╦═══════╗
║               Code               ║ Group ║
╠══════════════════════════════════╬═══════╣
║ Contains 0                       ║ M1    ║
║ Contains 1 or 16 or 40           ║ M2    ║
║ Contains 41 or 67                ║ M3    ║
║ Contains 68 or 70                ║ M4    ║
║ Contains codes of both M2 and M3 ║ M5    ║
║ Contains codes of both M3 and M4 ║ M6    ║
╚══════════════════════════════════╩═══════╝

Note: If ID has Code 0, Group is M1, then stop checking other rules.

The required output should looks like this:

╔══════════╗
║ ID Group ║
╠══════════╣
║ 1  M1    ║
║ 2  M2    ║
║ 3  M3    ║
║ 4  M4    ║
║ 5  M5    ║
║ 6  M6    ║
╚══════════╝

What I have tried so far is using STUFF and FOR XML PATH studied from this thread:

SELECT *, 
    STUFF((SELECT DISTINCT ', ' + Code FROM tblFee WHERE ID = t.ID FOR XML PATH ('')), 1, 1, '') AS Group
FROM (
    SELECT DISTINCT ID FROM tblFee 
) t

to have the result likes:

╔══════╦═══════════╗
║ Code ║   Group   ║
╠══════╬═══════════╣
║    1 ║ 0, 16, 41 ║
║    2 ║ 1, 40     ║
║    3 ║ 41, 67    ║
║    4 ║ 70        ║
║    5 ║ 16, 67    ║
║    6 ║ 41, 48    ║
╚══════╩═══════════╝

then use CASE...WHEN... and LIKE to check the rule. However my production data contains nearly 1 million records, so performance is a big problem.

Community
  • 1
  • 1
Trung DS
  • 103
  • 1
  • 6
  • 3
    Storing "rules" like that is just a bad idea. SQL doesn't understand English, so "M5 Contains codes of both M2 and M3" means nothing to it. You're better off storing proper data, which is normalised. – Thom A May 16 '20 at 20:38

2 Answers2

0

You can separate records to those having one code and others before applying the CASE operator:

create table #rules (code numeric, grp varchar(2));
insert into #rules (code, grp) values (0,'M1'),(1, 'M2'),(16, 'M2'),(40, 'M2'),
(41, 'M3'),(67, 'M3'),(68, 'M4'),(70, 'M4');

with raw_groups as (
select distinct ID, grp from table1 join #rules using(code)
group by ID, grp
order by ID
),

enumerated as (
select ID, grp, count(*) over (partition by ID) num
from raw_groups
),

mult as (
select id, string_agg(grp, ',') as groups
from enumerated
where num > 1
group by id
)

select ID, grp as 'GROUP' from enumerated 
where num=1

union all

select ID, 
case 
when groups like '%M1%' then 'M1'
when groups like '%M2%' and groups not like '%M1%' and groups not like '%M3%' and groups not like '%M4%' then 'M2'
when groups like '%M3%' and groups not like '%M1%' and groups not like '%M2%' and groups not like '%M4%' then 'M3'
when groups like '%M4%' and groups not like '%M1%' and groups not like '%M2%' and groups not like '%M3%' then 'M4'
when groups not like '%M1%' and groups like '%M2%' and groups like '%M3%' then 'M5'
when groups not like '%M1%' and groups like '%M3%' and groups like '%M4%' then 'M6'
else 'Rule not defined' end as 'GROUP'
from mult 
Vad1m
  • 389
  • 1
  • 14
0

One method is to use conditional aggregation and some case expressions:

select id,
       concat_ws(', ',
                 (case when has_code_0 > 0 then 'M1' end),
                 (case when has_code_1 > 0 or has_code_16 > 0 or has_code_40 then 'M2' end),
                 (case when has_code_41 > 0 or has_code_67 > 0 then 'M3' end),
                 (case when has_code_68 > 0 or has_code_70 > 0 then 'M4' end),
                 (case when (has_code_1 > 0 or has_code_16 > 0 or has_code_40) and (has_code_41 > 0 or has_code_67) then 'M5' end)
                 (case when (has_code_41 > 0 or has_code_67 > 0) and (has_code_68 > 0 or has_code_70 > 0) then 'M6' end)
                )
from (select t.id,
             max(case when code = 0 then 1 else 0 end) as has_code_0,
             max(case when code = 1 then 1 else 0 end) as has_code_1,
             max(case when code = 16 then 1 else 0 end) as has_code_16,
             max(case when code = 40 then 1 else 0 end) as has_code_40,
             max(case when code = 41 then 1 else 0 end) as has_code_41,
             max(case when code = 67 then 1 else 0 end) as has_code_67,
             max(case when code = 68 then 1 else 0 end) as has_code_68,
             max(case when code = 70 then 1 else 0 end) as has_code_70
      from t
      group by id
     ) t;

Note: concat_ws() is a recent addition to SQL Server. Slightly different code is needed in older versions.

There are various ways to structure this logic. For instance, you could assign the grouping flags in the subquery directly -- this is particularly appropriate if the groups are strictly hierarchical such as in your example:

select id,
       concat_ws(', ',
                 (case when in_group_1 > 0 then 'M1' end),
                 (case when in_group_2 > 0 then 'M2' end),
                 (case when in_group_3 > 0 then 'M3' end),
                 (case when in_group_4 > 0 then 'M4' end),
                 (case when in_group_2 > 0 and in_group_3 > 0 then 'M5' end),
                 (case when in_group_3 > 0 and in_group_4 > 0 then 'M6' end)
                )
from (select t.id,
             max(case when code = 0 then 1 else 0 end) as in_group_1,
             max(case when code = 1 then 1 else 0 end) as in_group_2,
             max(case when code = 16 then 1 else 0 end) as in_group_2,
             max(case when code = 40 then 1 else 0 end) as in_group_2,
             max(case when code = 41 then 1 else 0 end) as in_group_3,
             max(case when code = 67 then 1 else 0 end) as in_group_3,
             max(case when code = 68 then 1 else 0 end) as in_group_4,
             max(case when code = 70 then 1 else 0 end) as in_group_4
      from t
      group by id
     ) t;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786