Removing partial duplicates from a Union

Question

I have 2 datasets used for billing purposes and I've been asked to provide a report that summarises both. I'm having an issue where the system shows some of the billing in both places, as one side bills on Part No and Sales Order, and the other on Project and Customer, and I'm struggling to remove the duplicate.

The table post union is in the following format

REF	CUSTOMER	NETT_PRICE	DATE_CREATED	PART_NO	SALESORDER	PROJECT
AC	ABC	100	20/3/23	123	7001	PR01
AC	ABC	150	20/3/23	123	7001	PR01
AC	ABC	170	20/3/23	123	7002	PR02
FP	ABC	100	20/3/23	-	-	PR01
FP	ABC	150	20/3/23	-	-	PR01
FP	ABC	200	20/3/23	-	-	PR02

In the case of lines 1&4, and 2&5, they are the same billing as the NETT_PRICE, PROJECT and DATE_CREATED are the same, where REF dictates which area the billing came from. Where it's duplicated it has to be the AC reference that remains, and the order isn't always AC first.

My attempts at using ROW_NUMBER with OVER have not been successful, as shown below. Any guidance on this would be appreciated, or if ROW_NUMBER is not the way to go.

SELECT [REF]
      ,[NETT_PRICE]
      ,[QTY_ENTERED]
      ,[CREATED]
      ,[CUS]
      ,[SALESORDER]
      ,[PART_ONLY]
      ,ROW_NUMBER() over (partition by [NETT_PRICE], [QTY_ENTERED], [CREATED], [CUS], [PART_ONLY]
                            order by [CREATED], [OUR], [REF]
                        )
FROM [UDEF_SALESAREA_SUMMARY_TBL] WHERE ROW_NUMBER() > 1

From the above table, the expected output should be:

REF	CUSTOMER	NETT_PRICE	DATE_CREATED	PART_NO	SALESORDER	PROJECT
AC	ABC	100	20/3/23	123	7001	PR01
AC	ABC	150	20/3/23	123	7001	PR01
AC	ABC	170	20/3/23	123	7002	PR02
FP	ABC	200	20/3/23	-	-	PR02

Where the FP duplicates on NETT_PRICE, PROJECT and DATE_CREATED are removed.

Does this answer your question? [Get top 1 row of each group](https://stackoverflow.com/questions/6841605/get-top-1-row-of-each-group) — Thom A, Mar 30 '23 at 07:06
@ThomA almost, but the amount of entries in a day is not fixed. — Saiyanthou, Mar 30 '23 at 07:22

score 1 · Answer 1 · answered Mar 30 '23 at 08:54

How about something like this:

select *
from (
    SELECT  *
    , row_number() OVER(partition BY customer, NETT_PRICE, date_created ORDER BY CASE WHEN ref = 'ac' THEN 0 ELSE 1 END) AS sort
    FROM    (
        VALUES  (N'AC', N'ABC', 100, N'20/3/23', 123, 7001, N'PR01')
        ,   (N'AC', N'ABC', 150, N'20/3/23', 123, 7001, N'PR01')
        ,   (N'AC', N'ABC', 170, N'20/3/23', 123, 7002, N'PR02')
        ,   (N'FP', N'ABC', 100, N'20/3/23', NULL, NULL, N'PR01')
        ,   (N'FP', N'ABC', 150, N'20/3/23', NULL, NULL, N'PR01')
        ,   (N'FP', N'ABC', 200, N'20/3/23', NULL,NULL, N'PR02')
    ) t (REF,CUSTOMER,NETT_PRICE,DATE_CREATED,PART_NO,SALESORDER,PROJECT)
) x
where x.sort = 1

This "groups" every sales by customer, price and date_created and prioritizes AC ref by doing orderby. You can add more tie-breakers to order by in case there are more than one REFs.

This looks to do the trick, thank you. I have limited access to the actual data this morning, but seems to look okay on the test data I have. — Saiyanthou, Mar 31 '23 at 07:19

Removing partial duplicates from a Union

1 Answers1