-2

I have one table of contact records and I'm trying to get the count of duplicate records that were created on each date. I'm not looking to include the original instance in the count. I'm using SQL Server. Here's an example table

| email         | created_on |    
| ------------- | ---------- | 
| aaa@email.com | 08-16-22   |    
| bbb@email.com | 08-16-22   |
| zzz@email.com | 08-16-22   |    
| bbb@email.com | 07-12-22   | 
| aaa@email.com | 07-12-22   |    
| zzz@email.com | 06-08-22   |    
| aaa@email.com | 06-08-22   |    
| bbb@email.com | 04-21-22   |

And I'm expecting to return

| created_on | dupe_count | 
| ---------- | ---------- | 
| 08-16-22   | 3          | 
| 07-12-22   | 2          | 
| 06-08-22   | 0          | 
| 04-21-22   | 0          |

Edited to add error message: error message

beth
  • 1
  • 1
  • "I'm trying to get the count of duplicate records" - what have you tried? We'd love to see. – Dale K Aug 18 '22 at 20:19
  • Does this answer your question? [Finding duplicate values in a SQL table](https://stackoverflow.com/questions/2594829/finding-duplicate-values-in-a-sql-table) – Stu Aug 18 '22 at 20:22
  • 1
    Why are you expecting a count of `0` for `06-08-22`? Shouldn't that be `2`? – HoneyBadger Aug 18 '22 at 20:34
  • @Stu - I've tried that but I think it's returning all of the emails created on those dates if there was more than one email created and not just the duplicate emails – beth Aug 18 '22 at 20:37
  • @HoneyBadger - the emails on 06-08-22 are the first instances of those 2 so they wouldn't be considered duplicates yet until 07-12-22 and 08-16-22 – beth Aug 18 '22 at 20:40

2 Answers2

2

I created a sub table based on email and created date row number. Then, you query that, and ignore the date when the email first was created (row number 1). Works perfectly fine in this case.

Entire code:

Create table #Temp
(
    email           varchar(50),
    dateCreated     date
)

insert into #Temp
(email, dateCreated) values
('aaa@email.com',  '08-16-22'),   
('bbb@email.com',  '08-16-22'),
('zzz@email.com',  '08-16-22'),     
('bbb@email.com',  '07-12-22'), 
('aaa@email.com',  '07-12-22'),   
('zzz@email.com',  '06-08-22'),   
('aaa@email.com',  '06-08-22'),   
('bbb@email.com',  '04-21-22')   

select datecreated, sum(case when r = 1 then 0 else 1 end) as duplicates
from 
(
    Select email, datecreated, ROW_NUMBER() over(partition by email 
    order by datecreated) as r from #Temp
) b
group by dateCreated
drop table #Temp

Output:

datecreated duplicates
2022-04-21  0
2022-06-08  0
2022-07-12  2
2022-08-16  3
0

You can calculate the difference between total count of emails for every day and the count of unique emails for the day:

    select created_on, 
count(email) - count(distinct email) as dupe_count
    from cte
    group by created_on

It seems I have misunderstood your request, and you wanted to consider previous created_on dates' too:

   ct as (
select created_on,
(select case when (select count(*)
    from cte t2
 where t1.email = t2.email and t1.created_on > t2.created_on
 ) > 0 then email end) as c
 from cte t1)
    
 select created_on,
  count(distinct c) as dupe_count
  from ct
    group by created_on
    order by 1

It seems that in oracle it is also possible to aggregate it using one query:

select created_on,
count(distinct case when (select count(*)
    from cte t2
 where t1.email = t2.email and t1.created_on > t2.created_on
 ) > 0 then email end) as c
 from cte t1
 group by created_on
 order by 1
4yz
  • 156
  • 7
  • I like this approach but I don't think it's accounting for the previous dates, just looking at each date individually. It's returning all zeros for me. – beth Aug 18 '22 at 20:50
  • I liked that approach too. I have edited the reply to add the logic you have probably asked for. – 4yz Aug 18 '22 at 21:52
  • Yes, your edit considering the previous dates worked perfectly in pgAdmin but I can't quite get it to work in SQL Server. Any idea why? – beth Aug 19 '22 at 13:57
  • Did you use the SQL for T-SQL I provided? The other one is for Oracle SQL. T-SQL version is working for me: http://sqlfiddle.com/#!4/97eade/12353 .Also you can provide the error message screenshot. – 4yz Aug 19 '22 at 17:54
  • I used the T-SQL version but I'm still getting the error. I added the screenshot to my original question. I also tried adding 'WITH' to the beginning of the T-SQL query but then I get the error that 'WITH' statement​ is not supported'. Thanks, I really appreciate all of your help. – beth Aug 19 '22 at 18:38
  • Have you tried using CTE (with) on your DB version before? As an alternative you can wrap "ct" into subquery, or use select .... into #ct – 4yz Aug 19 '22 at 18:54
  • http://sqlfiddle.com/#!18/b28da/2374 – 4yz Aug 19 '22 at 19:03