Delete duplicates in SQL

Question

For an interview, I had to write a SQL-Query that deletes BestellNummer and Type duplicates from a tabel.

I wasn't allowed to use a temp table and should do it in one query.

With the help of another question on stackoverflow, I came to this solution (T-SQL: Deleting all duplicate rows but keeping one):

DELETE FROM auftrag WHERE ID NOT IN
(
    SELECT MIN(ID) FROM auftrag GROUP BY BestellNummer, Type
)

The auftrag table looked like this:

ID  BestellNummer   Type    Number   
0   123             O       1000
1   123             O       1001
2   123             E       1002
3   512             O       1003
4   512             O       1004
5   732             E       1005

The query now deletes ID 1 and 4, cause they are duplicates.

My question is, how does this query actually work? I can make out some bits, but I am a little confused of it.

It would be nice, if someone could give me a breakdown of how it works (:

Check again, because accepted answer is different, even in question you refered to. — Kadet, Jun 29 '22 at 22:08
The rows with ID 1 and 4 are not duplicates, the `Number` is different. — HoneyBadger, Jun 29 '22 at 22:38
AH, sorry duplicates are only where BestellNummer and Type is the same, Number should be ignored — ILikeSahne, Jun 30 '22 at 09:11
@ILikeSahne you need to delete duplicates and keep distinct values or you need to delete all duplicates give me clarity i'll give you the solution — Upender Reddy, Jun 30 '22 at 09:18

score 4 · Answer 1 · answered Jun 29 '22 at 22:09

The GROUP BY clause divides your data into groups based on unique combinations of the columns Bestellnummer and Type, here I have divided the rows with lines to show the groups:

ID  BestellNummer   Type    Number  
----------------------------------
0   123             O       1000
1   123             O       1001
----------------------------------
2   123             E       1002
----------------------------------
3   512             O       1003
4   512             O       1004
----------------------------------
5   732             E       1005

Then the MIN(id) simply finds the minimum value of the "id" column in each group, leaving you with id 0, 2, 3, and 5.

Then the DELETE says to delete the rows NOT IN (0, 2, 3, 5), thereby deleting rows 1 and 4, giving you one row per unique combination of Bestellnummer and Type.

Ah, thanks I get it now, I didnt really understand what GROUP BY does, thanks — ILikeSahne, Jun 29 '22 at 22:13

score 0 · Answer 2 · answered Jun 30 '22 at 09:26

By following way you can delete duplicate Records of BestellNummer and Type

WITH cte AS (
     SELECT *, row_number() OVER(PARTITION BY [BestellNummer],[Type] ORDER BY [ID]) AS [rn] FROM auftrag 
  )
DELETE cte WHERE [rn] > 1

Above Query is formated based on MSSQL

Delete duplicates in SQL

2 Answers2