0

I am struggling with the appropriate query to find duplicates while at the same time respecting the effective start and end dates for the record. Example data below.

ClientName   ClientID   EffectiveStart  EffectiveEnd
A            1          1900-01-01      2100-01-01
A            1          1900-01-01      2100-01-01
B            2          1900-01-01      2012-05-01
C            2          2012-05-01      2100-01-01
D            3          1900-01-01      2012-05-01
E            3          2012-04-30      2100-01-01
F            4          2012-04-15      2100-01-01

The output I am looking for is below.

ClientName  ClientID
A           1
D           3
E           3

The logic is that Client A has ID 1 duplicated. Client B and C have a duplicate (2) BUT the date ranges are such that the two duplicates DO NOT overlap each other, which means they should not be considered duplicates. Client D and E have ID 3 duplicated AND the date ranges overlap so it should be considered a duplicate. Client F does not have a duplicate so should not show in the output.

Any thoughts? Questions?

thomas
  • 2,592
  • 1
  • 30
  • 44

2 Answers2

1

There are two versions. Exists is simpler but likely slower than join. Exists checks for each record if there is an overlapping record per same clientid; it is bound to find at least one, itself, hence group by and having.

select distinct ClientName, ClientID
  from Table1
 where exists
 (
   select null
     from table1 test1
    where test1.clientid = table1.clientid
      and test1.EffectiveStart < table1.EffectiveEnd
      and test1.EffectiveEnd > table1.EffectiveStart
    group by test1.ClientID
   having count (*) > 1
 )

Join does the same, but as grouping is done on all records its having has to count them all.

select test1.clientid
from table1 test1 
join table1 test2 on test1.clientid = test2.clientid
where test1.EffectiveStart < test2.EffectiveEnd 
  and test1.EffectiveEnd > test2.EffectiveStart
group by test1.clientid
having count (*) > (select count (*) 
                      from table1 
                     where clientid = test1.clientid)

I omitted retrieval of clientname because people hate to see nested queries.

Live test is at Sql Fiddle.

Nikola Markovinović
  • 18,963
  • 5
  • 46
  • 51
0

Will need a PK

select c1.name, c2.name, c1.id
from client c1 
join client c2 on c1.id = c2.id and c1.PK < c2.PK
where c1.Start > c2.End or c1.End < c2.Start

Determine Whether Two Date Ranges Overlap please give him a +1

Community
  • 1
  • 1
paparazzo
  • 44,497
  • 23
  • 105
  • 176
  • This worked. Thanks. I already knew the logic to check for intersecting date ranges...just couldn't figure out how to join it all together. The PK was the key for me. Thanks! – thomas May 07 '12 at 20:49