2
create table dupt(cat varchar(10), num int)

insert dupt(cat,num) values ('A',1),('A',2),('A',3),
                            ('B',1),('B',2),
                            ('C',1),('C',2), ('C',3),
                            ('D',1),('D',2), ('D',4),
                            ('E',1),('E',2),
                            ('F',1),('F',2)

I need to create a report which finds out duplicate data. From the sample data above, report needs to show that data for cat A is duplicated by cat C (notice the num value and no. of records) and cat B is duplicated by cat E and F. What is the best way to show that?

Example output

-------------
|cat | dupby|
-------------
| A  |  C   |
| B  | E, F |
-------------
raj
  • 143
  • 1
  • 11
  • So what would the expected output be? Just rows that contained duplicates? Would the first two sets of A not also be a duplictae set of b? – S3S Mar 23 '17 at 13:14
  • 1
    Hi, I have added expected output, no. of rows for a category needs to match – raj Mar 23 '17 at 13:24
  • Resolved it by working around the solution posted at http://stackoverflow.com/a/6596590/1679692 – raj Mar 24 '17 at 08:56

1 Answers1

1

Updated: switched to traditional set matching using common table expression and the stuff() with select ... for xml path ('') method of string concatenation only on the final results:

;with cte as (
  select *
    , cnt = count(*) over (partition by cat)
  from t
)
, duplicates as (
  select 
        x.cat
      , dup_cat = x2.cat
  from cte as x 
    inner join cte as x2
       on x.cat < x2.cat
      and x.num = x2.num
      and x.cnt = x2.cnt 
  group by x.cat, x2.cat, x.cnt
  having count(*) = x.cnt
)
select 
    d.cat
  , dupby = stuff((
        select ', '+i.dup_cat
        from duplicates i
        where i.cat = d.cat
        for xml path (''), type).value('.','varchar(8000)')
     ,1,2,'')
from duplicates d
where not exists (
    select 1
    from duplicates i
    where d.cat = i.dup_cat
    )
group by d.cat

rextester demo: http://rextester.com/KHAG98718

returns:

+-----+-------+
| cat | dupby |
+-----+-------+
| A   | C     |
| B   | E, F  |
+-----+-------+
SqlZim
  • 37,248
  • 6
  • 41
  • 59
  • Wow, thanks @sqlzim for quick solution. Looks like it will work for my requirement. I will revert after some time. – raj Mar 23 '17 at 13:50
  • I am getting a conversion error, not because your query is faulty, but because in the actual data, cat is int. – raj Mar 23 '17 at 13:57
  • Added convert function, query seems to running but taking too much time. – raj Mar 23 '17 at 14:04
  • Above script works ok with smaller sets, but I need to run this on a table with around 400,000 records. This query is blocking other queries on database and I had to cancel it after 45 minutes. – raj Mar 24 '17 at 07:43
  • @raj Updated to a less costly set matching query. – SqlZim Mar 24 '17 at 12:29
  • Thanks @SqlZim for the help. – raj Apr 03 '17 at 10:55