MySql remove the rows that have a duplicate column

Question

This is my domains table:

domain       | ip
-------------|-----------
example.com  | 0.0.0.0
-------------|-----------
example1.com | 1.1.1.1
-------------|-----------
example2.com | 2.2.2.2
-------------|-----------
example3.com | 3.3.3.3
-------------|-----------
example2.com | 9.9.9.9
-------------|-----------
example4.com | 4.4.4.4
-------------|-----------
example3.com | 3.3.3.3
-------------|-----------

I want to delete the rows where are duplicate domains and keep the first value of ip, so I must have a table like this finally:

domain       | ip
-------------|-----------
example.com  | 0.0.0.0
-------------|-----------
example1.com | 1.1.1.1
-------------|-----------
example2.com | 2.2.2.2
-------------|-----------
example3.com | 3.3.3.3
-------------|-----------
example4.com | 4.4.4.4
-------------|-----------

Yes. I created a new table with DISTINCT(domain) and after that I tried to run a php script to get for each of that value the first ip with LIMIT 0,1 but the script takes years since I have more than 400.000 rows in the database. I am not an mysql expert — paulalexandru, Jan 20 '17 at 19:50
This is similar to [this question](http://stackoverflow.com/questions/6103212/how-do-i-delete-duplicate-rows-and-keep-the-first-row) and others on StackOverflow — Viet Trang, Jan 20 '17 at 19:52
Possible duplicate of [How do I delete duplicate rows and keep the first row?](http://stackoverflow.com/questions/6103212/how-do-i-delete-duplicate-rows-and-keep-the-first-row) — EoinS, Jan 20 '17 at 20:04

Gurwinder Singh · Accepted Answer · 2017-01-20T20:17:19.487

5

Try using INET_ATON and INET_NTOA with GROUP BY to get least IP for each domain like this:

SELECT 
    domain, INET_NTOA(MIN(INET_ATON(ip)))
FROM
    domains t1
GROUP BY domain;

MIN(IP) will not work the way you'd expect.

You can perform a delete using the above like this:

DELETE t1 FROM domains t1
        INNER JOIN
    (SELECT 
        domain, INET_NTOA(MIN(INET_ATON(ip))) ip
    FROM
        domains t1
    GROUP BY domain) t2 ON t1.domain = t2.domain AND t1.ip <> t2.ip;

Please note that if there are multiple rows with least IP for a domain, all of them will be kept.

You can alternatively create a new table to store distinct rows:

CREATE TABLE domains_new(domain varchar(100), IP varchar(30))
SELECT 
    domain, INET_NTOA(MIN(INET_ATON(ip)))
FROM
    domains t1
GROUP BY domain;

edited Jan 20 '17 at 20:17

answered Jan 20 '17 at 19:54

Gurwinder Singh

38,557
6
51
76

This select seem to be good. The thing is that I want either to remove the rest of the rows, either copy this data into a separate identical table. – paulalexandru Jan 20 '17 at 20:00
The second query does not work. It doesn't delete enought rows. The thing is that the select works. – paulalexandru Jan 20 '17 at 20:09
@paul doesn't work how? I've tried it and it deleted the row `example2.com | 9.9.9.9`. If you are talking about two rows with `example3.com | 3.3.3.3` values, I already mentioned that in the answer, there is no way to delete those, without creating a temporary table, loading distinct data into it, clearing table and loading back or simply select distinct – Gurwinder Singh Jan 20 '17 at 20:10
The delete query does not work. So I have 452616 records. If I make the select I get 421175 records (meaning that 31441 are duplicates). But when I try to run the delete query, I get " 0 rows deleted" – paulalexandru Jan 20 '17 at 20:13
Is there any way to use that select that you made first time and copy that data into another table? I will be fine with that also. – paulalexandru Jan 20 '17 at 20:15
@paul Updated my answer. – Gurwinder Singh Jan 20 '17 at 20:17
Thanks man. This solved my problem. I will just remove the other table and rename this new one and all is fine. – paulalexandru Jan 20 '17 at 20:19

wogsland · Answer 2 · 2017-01-20T19:58:13.020

1

So to delete the dups, assuming your table has an primary key called id,

DELETE FROM domains 
WHERE id IN 
(SELECT dyt.id FROM domains oyt, domains dyt
WHERE oyt.id < dyt.id
AND oyt.domain = dyt.domain
AND oyt.ip = dyt.ip)

edited Jan 20 '17 at 19:58

answered Jan 20 '17 at 19:52

wogsland

9,106
19
57
93

the table does not have any primary id, it's exactly as you see in the question – paulalexandru Jan 20 '17 at 19:52
Then creating a new table using `DISTINCT` is probably your best bet. – wogsland Jan 20 '17 at 19:53

Viet Trang · Answer 3 · 2017-01-20T20:41:56.953

1

if you assign each row a unique id

alter table domains add column id int first;
set @i = 0;
update domains set id=(@i:=@i+1);

then you may be able to do something like this:

delete from domains
where id not in (select id from 
(select id, domain, ip from domains group by domain having count(domain) > 1) as subq);

then just remove the key column

alter table domains drop column id;

edited Jan 20 '17 at 20:41

answered Jan 20 '17 at 20:24

Viet Trang

51
5

score 0 · Answer 4 · answered Jan 20 '17 at 20:10

0

create the second collection with same structure and try this.

INSERT INTO second_table SELECT DISTINCT * FROM domains

answered Jan 20 '17 at 20:10

mikhil mohanan

147
3
11

score 0 · Answer 5 · edited Jan 20 '17 at 21:36

0

This should work:

WITH result AS (
    SELECT Domain,
           Ip,
           ROW_NUMBER() OVER (PARTITION BY p.Domain 
                                 ORDER BY p.Ip) AS rk
      FROM DomainsTable p)
SELECT r.Domain, r.Ip
  FROM result r
WHERE r.rk = 1

edited Jan 20 '17 at 21:36

Nisse Engström

4,738
23
27
42

answered Jan 20 '17 at 20:35

Alan Machado

3
2

MySql remove the rows that have a duplicate column

5 Answers5