How do I Select all duplicates

Question

I am trying to develop a DMR from the DMR #1 which return all the data which are identified by DMR #1 so that I can load them to a work table and then delete the duplicates before inserting back to the main table. So I have DMR #2 but when execute it, it returns twice the rows which are identified by DMR #1. I need help with DMR #2 which will return only the number of rows identified by DMR #1

 --- DMR #1
 SELECT PHRMCY_CLAIM_ID
    ,PHRMCY_CLAIM_ID_SEQ_NUM
    ,BUSN_ENTY_ID
    ,MBR_ACCT_ID
  ,Min(DW_UPDT_USR) AS min_DW_UPDT_USR
    ,Max(DW_UPDT_USR) AS max_DW_UPDT_USR
    ,Count(1) AS rec_cnt
    FROM (SELECT 
             svc.PHRMCY_CLAIM_ID
            ,svc.PHRMCY_CLAIM_ID_SEQ_NUM
            ,svc.BUSN_ENTY_ID
            ,svc.MBR_ACCT_ID
            ,pcm.DW_UPDT_USR
          FROM USS.RXDW_PHRMCY AS SVC
          JOIN USS.MBR_ACCT AS pcm ON svc.PHRMCY_CLAIM_ID = pcm.PHRMCY_CLAIM_ID 
           and svc.PHRMCY_CLAIM_ID_SEQ_NUM = pcm.PHRMCY_CLAIM_ID_SEQ_NUM
           and svc.ADJUD_DT = pcm.ADJUD_DT
         ) AS S
GROUP BY 1,2,3,4
HAVING COUNT(1)>1;


--- DMR #2
SELECT M.* FROM USS.MBR_ACCT M
WHERE (M.PHRMCY_CLAIM_ID,M.PHRMCY_CLAIM_ID_SEQ_NUM,M.MBR_ACCT_ID) IN
( 
SELECT PHRMCY_CLAIM_ID
    ,PHRMCY_CLAIM_ID_SEQ_NUM
    ,MBR_ACCT_ID
FROM (SELECT 
         svc.PHRMCY_CLAIM_ID
        ,svc.PHRMCY_CLAIM_ID_SEQ_NUM
        ,svc.BUSN_ENTY_ID
        ,svc.MBR_ACCT_ID
        ,pcm.DW_UPDT_USR
     FROM USS.RXDW_PHRMCY AS SVC
     JOIN USS.MBR_ACCT AS pcm ON svc.PHRMCY_CLAIM_ID = pcm.PHRMCY_CLAIM_ID
      and svc.PHRMCY_CLAIM_ID_SEQ_NUM = pcm.PHRMCY_CLAIM_ID_SEQ_NUM
      and svc.ADJUD_DT = pcm.ADJUD_DT
     ) AS S
GROUP BY 1,2,3
HAVING COUNT(1)>1);

Of course, you get *all* the rows where duplicates exist. You need to add another column(s) which can be used to identify the individual rows, e.g. ` (...., whatever) IN (select ..., min(whatever))`. But if there's a unique set of columns you can also directly delete those rows using a delete where exists... — dnoeth, Jul 30 '22 at 09:15

simpleuser008 · Answer 1 · 2022-10-27T10:47:43.190

usuall solution is to use row_number() over (partition by id1,id2,id3 order by somecolumn ) in select phase to sort them and then select only rows that have row_number = 1

that can be done in qualify clause after where clause. "qualify row_number() over (parition by x,y,x order by z ) = 1"

This selects first row for each unique partition. if you want to delete then qualify clause > 1

Some other systems do support using row_number to delete in delete clause or in cte, no idea (yet) if teradata does it

How do I Select all duplicates

1 Answers1