I hope you can help me.
I have a table of data in a SQL database (Table1) which contains information in 5 columns (1,2,3,4,5).
I require a query which does the following:
- Some of these rows may be duplicated and these duplicates need to be removed.
- Some of these rows may have the same key generated by combining columns 1,2,3 but different data in columns in 4 and 5. The rule is as follows: keep the data with the most duplicates, and if there are an equal number of duplicates (one set of duplicates, i.e. two rows where 1,2,3 are the same and 4&5 differ), keep the one most recently entered into the system (i.e. the top of a DESC list).
For example:
1,2,3,4,5
a,a,a,b,b
a,a,a,b,b
a,a,a,b,c
Here, we would keep only the second row - 'a,a,a,b,b' as there are two of these duplicated rows and one of the other, 'a,a,a,b,c'. If there were two duplicated rows of both types of data, we would keep the one most recently entered (a,a,a,b,c).
I've been attempting to solve this using a combination of RANK(), ROW_NUMBER() and COUNT() without success, as i'm relatively new to SQL. I've been using a combination of the following threads:
how to select the most frequently appearing values?
Select first row in each GROUP BY group?
SQL Query Select first rank 1 row From Multiple ranks/Group
I hope this explanation has been clear.
Thanks for reading!