0

I have a report table (dbo.HighDonorPayments) with data like below:

ID   PAYMENTID  DONORID  MEMBERFIRSTNAME   MEMBERLASTNAME    ISBIGCHEESE
--------------------------------------------------------------------------
1    4133-ggi     661a   Samuel            Jackson           1
2    4133-ggi     661a   Samuel            Jackson           0          
3    4133-ggi     661a   Samuel            Jackson           0        
4    9392-ggi     990q   Space             Eye               0
5    9392-ggi     990q   Space             Eye               0

I need to remove the duplicate Payments but in a very specific way.

The end result should be:

ID   PAYMENTID  DONORID  MEMBERFIRSTNAME   MEMBERLASTNAME    ISBIGCHEESE
--------------------------------------------------------------------------
1    4133-ggi     661a   Samuel            Jackson           1
4    9392-ggi     990q   Space             Eye               0

OR

ID   PAYMENTID  DONORID  MEMBERFIRSTNAME   MEMBERLASTNAME    ISBIGCHEESE
--------------------------------------------------------------------------
1    4133-ggi     661a   Samuel            Jackson           1
5    9392-ggi     990q   Space             Eye               0

We have a "categorization" table for donors. Example: Give 10 bucks and you're "nice", give 25 and you're "great", give 50 and you're "big cheese". The catch is that (despite my protests), if you give us 10 bucks and are categorized "nice" THEN give us 50 later and become "BIG CHEESE", you have both a "nice" record AND the BIGCHEESE record in the categorization table. The client wants to see payments over a period of time. If a donor is "BIG CHEESE", she wants to see that record flagged.

Because I join to the "categorization" table, I end up with duplicate payment records. I need to remove the duplicates in such a way that any "ISBIGCHEESE=1" record is "protected" but other cases are de-duplicated as usual.

So for the case of Samuel Jackson, I would like to remove rows 2 and 3 but protect row 1. But for Space Eye, it doesn't matter which record is removed.

I tried using a CTE to get rid of the duplicates:

WITH eye_cte AS 
(
    SELECT 
        paymentid, 
        ROW_NUMBER() OVER (PARTITION BY paymentid ORDER BY revenueid) row_num 
    FROM 
        dbo.HighDonorPayments
) 
DELETE FROM eye_cte 
WHERE row_num > 1;

This works to remove duplicates but the record that is "saved" is...random, I suppose. I'm not sure how to say, "Hey! Remove all the duplicates for the case where PAYMENTID is identical but if the member has a BIGCHEESE rating, "protect" that one.

I've no idea if what I'm trying to do is possible and I might just have to change the query process that populates this table in the first place.

Thom A
  • 88,727
  • 11
  • 45
  • 75

3 Answers3

0

I think this should be as easy as making a flag

WITH eye_cte AS (
  SELECT *,
    ROW_NUMBER() OVER (PARTITION BY paymentid ORDER BY revenueid) row_num,
    CASE WHEN isbigcheese = 1 THEN 1 ELSE 0 END as is_protected
  FROM DBO.highdonorpayments
)
DELETE FROM eye_cte
WHERE row_num > 1 AND is_protected = 0
artemis
  • 6,857
  • 11
  • 46
  • 99
0

If isbigcheese can only be 1 or 0, MAX(isbigcheese) should do what you need. Fiddle

SELECT 
  DISTINCT
       MIN(ID) ID
     , PAYMENTID
     , DONORID
     , MEMBERFIRSTNAME
     , MEMBERLASTNAME
     , MAX(ISBIGCHEESE) ISBIGCHEESE
  FROM HighDonorPayments
GROUP BY DONORID, PAYMENTID, MEMBERFIRSTNAME, MEMBERLASTNAME

Returns

ID PAYMENTID DONORID MEMBERFIRSTNAME MEMBERLASTNAME ISBIGCHEESE
1 4133-ggi 661a Samuel Jackson 1
4 9392-ggi 990q Space Eye 0
Tom Boyd
  • 385
  • 1
  • 7
0

I need to remove the duplicates in such a way that any "ISBIGCHEESE=1" record is "protected" but other cases are de-duplicated as usual

You want to "protect" these rows by generating row_num=1 for them because you are DELETE-ing WHERE row_num > 1.

So you have to refactor the ORDER BY of your ROW_NUMBER column so that ISBIGCHEESE=1 rows come first.

To push these rows to the front try ORDER BY CASE WHEN ISBIGCHEESE=1 THEN 0 ELSE 1 END, ... there like this

WITH eye_cte AS 
(
    SELECT 
        paymentid, 
        ROW_NUMBER() OVER (PARTITION BY paymentid ORDER BY
             CASE WHEN ISBIGCHEESE=1 THEN 0 ELSE 1 END, revenueid) row_num 
    FROM 
        dbo.HighDonorPayments
) 
DELETE FROM eye_cte 
WHERE row_num > 1;
wqw
  • 11,771
  • 1
  • 33
  • 41