SQL query to return duplicate rows for certain column, but with unique values for another column

Question

I have written the query shown here that combines three tables and returns rows where the at_ticket_num from appeal_tickets is duplicated but against a different at_sys_ref value

select top 100 
    t.t_reference, at.at_system_ref, at_ticket_num, a.a_case_ref
from 
    tickets t, appeal_tickets at, appeals_2 a
where 
    t.t_reference in ('AB123','AB234') -- filtering on these values so that I can see that its working
    and t.t_number = at.at_ticket_num
    and at.at_system_ref = a.a_system_ref
    and at.at_ticket_num IN (select at_ticket_num
                             from appeal_tickets
                             group by at_ticket_num
                             having count(distinct at_system_ref) > 1)
order by 
    t.t_reference desc

This is the output:

t_reference  at_system_ref  at_ticket_num   a_case_ref
-------------------------------------------------------
    AB123       30838974      23641583      1111979010
    AB123       30838976      23641583      1111979010
    AB234       30839149      23641520      1111977352
    AB234       30839209      23641520      1111988003

I want to modify this so that it only returns records where t_reference is duplicated but against a different a_case_ref. So in above case only records for AB234 would be returned.

Any help would be much appreciated.

Note that tables have rows and columns, not records and fields. — jarlh, Feb 15 '22 at 17:38
Tip of today: Always use modern, explicit `JOIN` syntax! Easier to write (without errors), easier to read and maintain, and easier to convert to outer join if needed! — jarlh, Feb 15 '22 at 17:38
[Bad habits to kick : using old-style JOINs](https://sqlblog.org/2009/10/08/bad-habits-to-kick-using-old-style-joins) - that old-style *comma-separated list of tables* style was replaced with the *proper* ANSI `JOIN` syntax in the ANSI-**92** SQL Standard (**30 years!!** ago) and its use is discouraged — marc_s, Feb 15 '22 at 17:51
I can't get it to work. I get syntax errors all over the place. Shouldn't make any difference but I changed my original query to use joins as suggested above. — chucknor, Feb 15 '22 at 22:00
It may be a good idea to set up a fiddle in https://dbfiddle.uk/, so we see the tables with some sample data in them and can try our queries. — Thorsten Kettner, Feb 16 '22 at 16:46

shawnt00 · Answer 1 · 2022-02-15T21:59:27.090

0

with data as (
    <your query plus one column>,
    case when
        min() over (partition by t.t_reference)
        <>
        max() over (partition by t.t_reference)
        then 1 end as dup
)
select * from data where dup = 1

edited Feb 15 '22 at 21:59

answered Feb 15 '22 at 18:00

shawnt00

16,443
3
17
22

What do you mean by plus one column. I am getting syntax errors around the case statement when i try this query. – chucknor Feb 15 '22 at 21:13
I mean insert your query along with the additional column described. – shawnt00 Feb 15 '22 at 21:34

Thorsten Kettner · Accepted Answer · 2022-02-16T06:45:19.407

0

You want all ticket appeals that have more than one system reference and more than one case reference it seems. You can join the tables, count the occurrences per ticket and then only keep the tickets that match these criteria.

select *
from
(
  select
    t.t_reference, at.at_system_ref, at.at_ticket_num, a.a_case_ref,
    count(distinct a.a_system_ref) over (partition by at.at_ticket_num) as sysrefs,
    count(distinct a.a_case_ref) over (partition by at.at_ticket_num) as caserefs
  from tickets t
  join appeal_tickets at on at.at_ticket_num = t.t_number
  join appeals_2 a on a.a_system_ref = at.at_system_ref
) counted
where sysrefs > 1 and caserefs > 1
order by t.t_reference, at.at_system_ref, at.at_ticket_num, a.a_case_ref;

Correction

It seems that SQL Server still doesn't support COUNT(DISTINCT ...) OVER (...). You can count distinct values in a subquery though. Replace

count(distinct a.a_system_ref) over (partition by at.at_ticket_num) as sysrefs,

by

(
  select count(distinct a2.a_system_ref)
  from appeal_tickets at2
  join appeals_2 a2 on a2.a_system_ref = at2.at_system_ref
  where at2.at_ticket_num = t.t_number
) as sysrefs,

An alternative workaround is to use DENSE_RANK in two directions (found here: https://stackoverflow.com/a/53518204/2270762):

dense_rank() over (partition by at.at_ticket_num order by a.a_system_ref) +
dense_rank() over (partition by at.at_ticket_num order by a.a_system_ref desc) - 
1 as sysrefs,

edited Feb 16 '22 at 06:45

answered Feb 15 '22 at 18:01

Thorsten Kettner

89,309
7
49
73

When I execute this query I get the error 'Use of DISTINCT is not allowed with the OVER clause.' googling this error online suggests that I need to emply the dense_rank function. My SQL knowledge is pretty weak and I've never even seen dense_rank before. I will need to do some reading to see if I can figure it out. Thanks. – chucknor Feb 15 '22 at 21:19
That is poor showing on SQL Server's side. They don't support `DISTINCT` in window functions it seems. I have updated my answer with two solutions for this. – Thorsten Kettner Feb 16 '22 at 06:47
Sorry mate. This hasn't worked. I had to modify your sub query correction above as it didn't include a join on the tickets table. I added one for the case refs that looks like ...( select count(distinct a2.a_case_ref) from appeal_tickets at2 join appeals_2 a2 on a2.a_system_ref = at2.at_system_ref join tickets t on t.t_number = at2.at_ticket_num where at2.at_ticket_num = t.t_number ) as caserefs. But when I run the query filtering by just the 2 ticket numbers I get ... – chucknor Feb 16 '22 at 10:01
t_reference at_system_ref at_ticket_num a_case_ref sysrefs caserefs AB123 30839149 23641520 2222977352 169967 158847 AB123 30839209 23641520 2222988003 169967 158847 AB234 30838974 23641583 2222979010 169967 158847 AB234 30838976 23641583 2222979010 169967 158847 .. – chucknor Feb 16 '22 at 10:02
i.e. with the counts of unique systemrefs and caserefs at the end and it is not filtering out the ticket number where the caseref is the same. – chucknor Feb 16 '22 at 11:20
You have mutilated the subquery. It counted references for the ticket number in your main query (`where at2.at_ticket_num = t.t_number`), but you made it count all references by removing this dependency and selecting all tickets instead. There must be no join to the tickets table. – Thorsten Kettner Feb 16 '22 at 16:40
That subquery didn't work though as it was referencing t_number which is from the tickets table. I think I got it working using the dense_Rank method you suggested. Its hard to tell if it is giving me exactly what I want but it does work when filtering on the 2 specific tickets i.e. doesn't include the one associated to 2 cases with same a_case_ref number but does return the one assoicated to diff a_case_ref's . I'll have a look at this again tomorrow. – chucknor Feb 16 '22 at 21:01
The subquery must reference t_number in order to be correlated to the main query. How else is the subquery to know for which ticket number to count? It sounds like you are not using my full query where the from clause starts with `from tickets t`. Well, the dense_rank soultion is more compact anyhow. Please study how the subquery works and how the dense_rank solution works in order to learn something from this rather than just copy and paste. – Thorsten Kettner Feb 17 '22 at 06:12

SQL query to return duplicate rows for certain column, but with unique values for another column

2 Answers2

Correction