Remove rows with duplicate values

Question

I have to clean a table with duplicate rows:

id: serial id
gid: group id
url: string <- this is the column that I have to cleanup

One gid may have multiple url values:

id    gid   url
----  ----  ------------
1     12    www.gmail.com
2     12    www.some.com
3     12    www.some.com <-- duplicate
4     13    www.other.com
5     13    www.milfsome.com <-- not a duplicate

I want to execute one query against the entire table and delete all rows where the gid and url are duplicate. In the above sample, after the delete, I want to have only 1, 2, 4 and 5 remaining.

What version of SQL? (Also, there are other posts on this exact topic.) — Eric J. Price, Apr 04 '13 at 14:43
Can you please be less ambiguous. Do you want to *DELETE* data or do you just want to hide the redundant data in a query? Saying things like "remove rows" and "clean a table" and then contradicting yourself with things like "keep them as they are" makes trying to help you very difficult as you can see from the answers and comments below. — Aaron Bertrand, Apr 04 '13 at 14:54
"What does clean a table mean?" Here we have a product called "Spray and Wipe" ;-) — Sepster, Apr 04 '13 at 14:58

Aaron Bertrand · Accepted Answer · 2013-04-04T14:49:13.487

13

;WITH x AS 
(
   SELECT id, gid, url, rn = ROW_NUMBER() OVER
     (PARTITION BY gid, url ORDER BY id) 
   FROM dbo.table
)
SELECT id,gid,url FROM x WHERE rn = 1 -- the rows you'll keep
-- SELECT id,gid,url FROM x WHERE rn > 1 -- the rows you'll delete
-- DELETE x WHERE rn > 1; -- do the delete

Once you're happy with the first select, which indicates the rows you'll keep, remove it and un-comment the second select. Once you're happy with that, which indicates the rows you'll delete, remove it and un-comment the delete.

And if you don't want to delete data, just ignore the commented lines under the SELECT...

edited Apr 04 '13 at 14:49

answered Apr 04 '13 at 14:43

Aaron Bertrand

272,866
37
466
490

+1 because this is the right answer, but I'm unclear on whether you want to include gid in the partition by. I think based upon the language that he may want it included. – Eric J. Price Apr 04 '13 at 14:46
@Love2Learn yep, updates crossed – Aaron Bertrand Apr 04 '13 at 14:46
Gotcha, missed that one. :) – Eric J. Price Apr 04 '13 at 14:46
Same approach worked with Oracle also – IT ppl Sep 14 '15 at 13:45
1

This looked foreign to me, at first, then as I wrote this up to accommodate my own table and needs, it started making a lot of sense, and worked beautifully! Thanks, Aaron! Great answer! Helped so much, and not a single syntactical error! – VoidKing Jan 15 '16 at 19:23
This is not working for me. I replaces `x` with my table name and the columns with columns in my own table but got the error... ` You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'expenses AS ( SELECT expense_id, user_id, transaction_date, bank, rn = ROW_N' at line 1 ` – Pila Jul 10 '17 at 14:59
1

@Pila This question is about Microsoft SQL Server, not MySQL. – Aaron Bertrand Jul 10 '17 at 18:18
@AaronBertrand I figured. :) I got my answer from another StackOverflow question though – Pila Jul 11 '17 at 07:41

score 1 · Answer 2 · edited Apr 04 '13 at 14:53

1

SELECT 
MIN(id) AS id,
gid,
url
FROM yourTable
GROUP BY gid, url

edited Apr 04 '13 at 14:53

Aaron Bertrand

272,866
37
466
490

answered Apr 04 '13 at 14:44

fancyPants

50,732
33
89
96

May I quote the OP: " i have to keep them as they are" I see nowhere anything mentioned of deleting anything. – fancyPants Apr 04 '13 at 14:47
"remove rows" and "clean a table"? – Aaron Bertrand Apr 04 '13 at 14:48
Okay, okay, I'll let my answer anyway, just so OP sees how those "GROUP BY and HAVING tricks" work. – fancyPants Apr 04 '13 at 14:51
+1 @AaronBertrand I have to disagree - the question is ambiguous... I also took it to mean something along the lines of "how can I _present_ the data 'cleaned-up', with duplicates removed, while leaving the source data in place". I think I'd err on the side of NOT deleting ;-) – Sepster Apr 04 '13 at 14:51

Remove rows with duplicate values

2 Answers2

Linked