Removing duplicates in SQL Server 2012 by grouping data with multiple rules

Question

I hope you can help me.

I have a table of data in a SQL database (Table1) which contains information in 5 columns (1,2,3,4,5).

I require a query which does the following:

Some of these rows may be duplicated and these duplicates need to be removed.
Some of these rows may have the same key generated by combining columns 1,2,3 but different data in columns in 4 and 5. The rule is as follows: keep the data with the most duplicates, and if there are an equal number of duplicates (one set of duplicates, i.e. two rows where 1,2,3 are the same and 4&5 differ), keep the one most recently entered into the system (i.e. the top of a DESC list).

For example:

1,2,3,4,5

a,a,a,b,b

a,a,a,b,c

Here, we would keep only the second row - 'a,a,a,b,b' as there are two of these duplicated rows and one of the other, 'a,a,a,b,c'. If there were two duplicated rows of both types of data, we would keep the one most recently entered (a,a,a,b,c).

I've been attempting to solve this using a combination of RANK(), ROW_NUMBER() and COUNT() without success, as i'm relatively new to SQL. I've been using a combination of the following threads:

how to select the most frequently appearing values?

Select first row in each GROUP BY group?

SQL Query Select first rank 1 row From Multiple ranks/Group

I hope this explanation has been clear.

Thanks for reading!

score 1 · Answer 1 · answered Nov 19 '15 at 18:12

1

guess this should help:

with cte as (
select 
     column1, 
     column2,
     column3,
     column4,
     column5, 
     ROW_NUMBER() OVER(PARTITION BY Column1, Column2, Column3, Column4, Column5 ORDER BY Column4 DESC) AS R
 from 
     YourTable
)

select * from cte where r = 1

this is the way i always remove my duplicates ... hope this helps ...

answered Nov 19 '15 at 18:12

csarwi

35
6

Thanks, this is removes duplicates, but doesn't answer the second rule - if there are two sets of duplicate rows (i.e. 4 rows, two duplicated), the query must select only the most recent row to be entered. This script simple selects the first in the order, not the most frequently duplicated item. – stax1001 Nov 20 '15 at 14:22

Removing duplicates in SQL Server 2012 by grouping data with multiple rules

1 Answers1