MySQL delete duplicates in same table

Question

I've been slamming my head on this problem and viewed a few similar situations on stack exchange and the more I read the more confused I get! I have a table that has the following:

id | zipcode | provider | channel
---------------------------------
1  | 91773   | 342      | 0
2  | 91773   | 2261     | 177
3  | 91773   | 5590     | 0
4  | 91773   | 5590     | 0
5  | 91773   | 5590     | 135
6  | 91773   | 5590     | 0
7  | 91773   | 6010     | 0
8  | 91773   | 6010     | 0

I want to keep only one record and if the channel has anything but a 0 then we keep it. So it should return this:

id | zipcode | provider | channel
---------------------------------
1  | 91773   | 342      | 0
2  | 91773   | 2261     | 177
5  | 91773   | 5590     | 135
8  | 91773   | 6010     | 0

I tried quite a few queries, but none worked. Thanks in advance.

Edit: I've tried some of the examples given, but none seem to give back the correct info, a better example would be to use these and you can see why: insert into unicorns values

(1, 91773, 342, 0),
(2, 91773, 2261, 177),
(3, 91773, 5590, 0),
(4, 91773, 5590, 0),
(5, 91773, 5590, 135),
(6, 91773, 5590, 0),
(7, 91773, 6010, 0),
(8, 91773, 6010, 0),
(9, 91776, 5590, 135),
(10, 91776, 5590, 0),
(11, 91776, 6010, 0),
(12, 91776, 6010, 0);

possible duplicate of [how to delete duplicates on mysql table?](http://stackoverflow.com/questions/2630440/how-to-delete-duplicates-on-mysql-table) — John Conde, Feb 28 '13 at 20:24
i think you need to group by zipcode and provider, not just by zipcode, i updated my answer, pls see if it is okay — fthiella, Mar 01 '13 at 16:51

Kermit · Answer 1 · 2013-02-28T20:40:05.873

3

Something like this should work...

DELETE unicorns 
FROM   unicorns 
WHERE  id NOT IN (SELECT id 
                  FROM   (SELECT unicorns.provider, 
                                 Max(id) AS id 
                          FROM   unicorns 
                                 LEFT JOIN (SELECT provider, 
                                                   Max(channel) AS channel 
                                            FROM   unicorns 
                                            GROUP  BY provider) p 
                                        ON p.provider = unicorns.provider 
                                           AND p.channel = unicorns.channel 
                          WHERE  p.provider IS NOT NULL 
                          GROUP  BY unicorns.provider) p2)

See the demo

edited Feb 28 '13 at 20:40

answered Feb 28 '13 at 20:26

Kermit

33,827
13
85
121

1

Thanks so much, I wasn't even close to figuring that out. I'll stick to graphics. :) Thanks again Aarolama! – Anthony Feb 28 '13 at 20:51
@aariloama I tested this and it seemed not to work, it deleted zip codes which it should keep at least one of each zip code; it deleted the exact amount of providers. – Anthony Feb 28 '13 at 21:02
@Anthony Please try the other solutions as well. – Kermit Feb 28 '13 at 21:09
@AarolamaBluenk this query looks correct to me... i took a slightly different approach, but mine is equivalent (thanks for the fiddle, i used it in my answer, i gave a +1 already) – fthiella Mar 01 '13 at 09:26

score 3 · Answer 2 · answered Feb 28 '13 at 20:59

You can perform this using a self join. If you are just doing the delete:

DELETE a
FROM foo a 
  JOIN foo b on (a.zipcode = b.zipcode AND a.provider = b.provider)
WHERE 
  a.channel < b.channel;

If you want to see the data before you delete(what will be left):

SELECT * 
FROM foo
WHERE ID NOT IN (
 SELECT a.id
 FROM foo a 
   JOIN foo b on (a.zipcode = b.zipcode AND a.provider = b.provider)
 WHERE 
   a.channel < b.channel);

SQL Fiddle demo

fthiella · Accepted Answer · 2013-03-01T16:50:14.283

0

I think you could use something like this:

delete from unicorns
where id not in (
  SELECT * FROM (
    SELECT MAX(id)
    FROM unicorns
    WHERE (provider, zipcode, channel) IN (
      SELECT provider, zipcode, max(channel) mx_channel
      FROM unicorns
      GROUP BY provider, zipcode
    )
    GROUP BY provider, zipcode) s)

Please see fiddle here.

edited Mar 01 '13 at 16:50

answered Feb 28 '13 at 21:03

fthiella

48,073
15
90
106

2

My only hesitation is that you're incorrectly using `GROUP BY` – Kermit Mar 01 '13 at 15:31
@AarolamaBluenk why incorrectly? i'm returning the maxumum channel for each provider, and then the maximum id for every provider that has the maximum value... it looks correct to me – fthiella Mar 01 '13 at 16:10
Set your SQL mode to `ONLY_FULL_GROUP_BY` and you will see why. MySQL extends it's functionality for convenience. If you ever move to a different platform, you'll cry. I certainly did. – Kermit Mar 01 '13 at 16:15
@AarolamaBluenk i'm not using non-aggregated columns on the select clause, so it should still work – fthiella Mar 01 '13 at 16:22
@AarolamaBluenk i think that SQL Server doesn't support `(provider, channel) IN ()` when subquery returns more than one column. I like this syntax more than a INNER JOIN because it's easier to understand, but the GROUP BY looks correct http://sqlfiddle.com/#!3/b7471/8 – fthiella Mar 01 '13 at 16:38
@AarolamaBluenk there are three GROUP BY, the last one is wrong, I have only two http://sqlfiddle.com/#!3/b7471/20 – fthiella Mar 01 '13 at 16:47
I am testing this one, the query is taking a while as I have 125,000 records; I'm not sure how long a query like this should take, anyone know? – Anthony Mar 01 '13 at 19:12
This one seems to work with the small test (the others do not), so I gave this the best answer (still waiting for the query to finish on 125,000 records). – Anthony Mar 01 '13 at 22:33
@Anthony you could try to add an index on provider and zipcode, like `ALTER TABLE yourtable ADD INDEX provider_zipcode_idx (provider, zipcode);` it should make the query a lot faster – fthiella Mar 01 '13 at 22:44

MySQL delete duplicates in same table

3 Answers3