5

I have to clean a table with duplicate rows:

id: serial id
gid: group id
url: string <- this is the column that I have to cleanup

One gid may have multiple url values:

id    gid   url
----  ----  ------------
1     12    www.gmail.com
2     12    www.some.com
3     12    www.some.com <-- duplicate
4     13    www.other.com
5     13    www.milfsome.com <-- not a duplicate

I want to execute one query against the entire table and delete all rows where the gid and url are duplicate. In the above sample, after the delete, I want to have only 1, 2, 4 and 5 remaining.

eyurdakul
  • 894
  • 2
  • 12
  • 29
  • What version of SQL? (Also, there are other posts on this exact topic.) – Eric J. Price Apr 04 '13 at 14:43
  • 1
    Can you please be less ambiguous. Do you want to *DELETE* data or do you just want to hide the redundant data in a query? Saying things like "remove rows" and "clean a table" and then contradicting yourself with things like "keep them as they are" makes trying to help you very difficult as you can see from the answers and comments below. – Aaron Bertrand Apr 04 '13 at 14:54
  • what does clean a table mean? delete them forever. – eyurdakul Apr 04 '13 at 14:56
  • 1
    "What does clean a table mean?" Here we have a product called "Spray and Wipe" ;-) – Sepster Apr 04 '13 at 14:58

2 Answers2

13
;WITH x AS 
(
   SELECT id, gid, url, rn = ROW_NUMBER() OVER
     (PARTITION BY gid, url ORDER BY id) 
   FROM dbo.table
)
SELECT id,gid,url FROM x WHERE rn = 1 -- the rows you'll keep
-- SELECT id,gid,url FROM x WHERE rn > 1 -- the rows you'll delete
-- DELETE x WHERE rn > 1; -- do the delete

Once you're happy with the first select, which indicates the rows you'll keep, remove it and un-comment the second select. Once you're happy with that, which indicates the rows you'll delete, remove it and un-comment the delete.

And if you don't want to delete data, just ignore the commented lines under the SELECT...

Aaron Bertrand
  • 272,866
  • 37
  • 466
  • 490
  • +1 because this is the right answer, but I'm unclear on whether you want to include gid in the partition by. I think based upon the language that he may want it included. – Eric J. Price Apr 04 '13 at 14:46
  • @Love2Learn yep, updates crossed – Aaron Bertrand Apr 04 '13 at 14:46
  • Gotcha, missed that one. :) – Eric J. Price Apr 04 '13 at 14:46
  • Same approach worked with Oracle also – IT ppl Sep 14 '15 at 13:45
  • 1
    This looked foreign to me, at first, then as I wrote this up to accommodate my own table and needs, it started making a lot of sense, and worked beautifully! Thanks, Aaron! Great answer! Helped so much, and not a single syntactical error! – VoidKing Jan 15 '16 at 19:23
  • This is not working for me. I replaces `x` with my table name and the columns with columns in my own table but got the error... ` You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'expenses AS ( SELECT expense_id, user_id, transaction_date, bank, rn = ROW_N' at line 1 ` – Pila Jul 10 '17 at 14:59
  • 1
    @Pila This question is about Microsoft SQL Server, not MySQL. – Aaron Bertrand Jul 10 '17 at 18:18
  • @AaronBertrand I figured. :) I got my answer from another StackOverflow question though – Pila Jul 11 '17 at 07:41
1
SELECT 
MIN(id) AS id,
gid,
url
FROM yourTable
GROUP BY gid, url 
Aaron Bertrand
  • 272,866
  • 37
  • 466
  • 490
fancyPants
  • 50,732
  • 33
  • 89
  • 96
  • May I quote the OP: " i have to keep them as they are" I see nowhere anything mentioned of deleting anything. – fancyPants Apr 04 '13 at 14:47
  • "remove rows" and "clean a table"? – Aaron Bertrand Apr 04 '13 at 14:48
  • Okay, okay, I'll let my answer anyway, just so OP sees how those "GROUP BY and HAVING tricks" work. – fancyPants Apr 04 '13 at 14:51
  • +1 @AaronBertrand I have to disagree - the question is ambiguous... I also took it to mean something along the lines of "how can I _present_ the data 'cleaned-up', with duplicates removed, while leaving the source data in place". I think I'd err on the side of NOT deleting ;-) – Sepster Apr 04 '13 at 14:51