1

This is my domains table:

domain       | ip
-------------|-----------
example.com  | 0.0.0.0
-------------|-----------
example1.com | 1.1.1.1
-------------|-----------
example2.com | 2.2.2.2
-------------|-----------
example3.com | 3.3.3.3
-------------|-----------
example2.com | 9.9.9.9
-------------|-----------
example4.com | 4.4.4.4
-------------|-----------
example3.com | 3.3.3.3
-------------|-----------

I want to delete the rows where are duplicate domains and keep the first value of ip, so I must have a table like this finally:

domain       | ip
-------------|-----------
example.com  | 0.0.0.0
-------------|-----------
example1.com | 1.1.1.1
-------------|-----------
example2.com | 2.2.2.2
-------------|-----------
example3.com | 3.3.3.3
-------------|-----------
example4.com | 4.4.4.4
-------------|-----------
paulalexandru
  • 9,218
  • 7
  • 66
  • 94
  • 1
    Have you tried something yet? – Gurwinder Singh Jan 20 '17 at 19:48
  • Yes. I created a new table with DISTINCT(domain) and after that I tried to run a php script to get for each of that value the first ip with LIMIT 0,1 but the script takes years since I have more than 400.000 rows in the database. I am not an mysql expert – paulalexandru Jan 20 '17 at 19:50
  • 1
    This is similar to [this question](http://stackoverflow.com/questions/6103212/how-do-i-delete-duplicate-rows-and-keep-the-first-row) and others on StackOverflow – Viet Trang Jan 20 '17 at 19:52
  • 1
    Possible duplicate of [How do I delete duplicate rows and keep the first row?](http://stackoverflow.com/questions/6103212/how-do-i-delete-duplicate-rows-and-keep-the-first-row) – EoinS Jan 20 '17 at 20:04

5 Answers5

5

Try using INET_ATON and INET_NTOA with GROUP BY to get least IP for each domain like this:

SELECT 
    domain, INET_NTOA(MIN(INET_ATON(ip)))
FROM
    domains t1
GROUP BY domain;

MIN(IP) will not work the way you'd expect.

You can perform a delete using the above like this:

DELETE t1 FROM domains t1
        INNER JOIN
    (SELECT 
        domain, INET_NTOA(MIN(INET_ATON(ip))) ip
    FROM
        domains t1
    GROUP BY domain) t2 ON t1.domain = t2.domain AND t1.ip <> t2.ip;

Please note that if there are multiple rows with least IP for a domain, all of them will be kept.

You can alternatively create a new table to store distinct rows:

CREATE TABLE domains_new(domain varchar(100), IP varchar(30))
SELECT 
    domain, INET_NTOA(MIN(INET_ATON(ip)))
FROM
    domains t1
GROUP BY domain;
Gurwinder Singh
  • 38,557
  • 6
  • 51
  • 76
  • This select seem to be good. The thing is that I want either to remove the rest of the rows, either copy this data into a separate identical table. – paulalexandru Jan 20 '17 at 20:00
  • The second query does not work. It doesn't delete enought rows. The thing is that the select works. – paulalexandru Jan 20 '17 at 20:09
  • @paul doesn't work how? I've tried it and it deleted the row `example2.com | 9.9.9.9`. If you are talking about two rows with `example3.com | 3.3.3.3` values, I already mentioned that in the answer, there is no way to delete those, without creating a temporary table, loading distinct data into it, clearing table and loading back or simply select distinct – Gurwinder Singh Jan 20 '17 at 20:10
  • The delete query does not work. So I have 452616 records. If I make the select I get 421175 records (meaning that 31441 are duplicates). But when I try to run the delete query, I get " 0 rows deleted" – paulalexandru Jan 20 '17 at 20:13
  • Is there any way to use that select that you made first time and copy that data into another table? I will be fine with that also. – paulalexandru Jan 20 '17 at 20:15
  • @paul Updated my answer. – Gurwinder Singh Jan 20 '17 at 20:17
  • Thanks man. This solved my problem. I will just remove the other table and rename this new one and all is fine. – paulalexandru Jan 20 '17 at 20:19
1

So to delete the dups, assuming your table has an primary key called id,

DELETE FROM domains 
WHERE id IN 
(SELECT dyt.id FROM domains oyt, domains dyt
WHERE oyt.id < dyt.id
AND oyt.domain = dyt.domain
AND oyt.ip = dyt.ip)
wogsland
  • 9,106
  • 19
  • 57
  • 93
1

if you assign each row a unique id

alter table domains add column id int first;
set @i = 0;
update domains set id=(@i:=@i+1);

then you may be able to do something like this:

delete from domains
where id not in (select id from 
(select id, domain, ip from domains group by domain having count(domain) > 1) as subq);

then just remove the key column

alter table domains drop column id;
Viet Trang
  • 51
  • 5
0

create the second collection with same structure and try this.

INSERT INTO second_table SELECT DISTINCT * FROM domains
mikhil mohanan
  • 147
  • 3
  • 11
0

This should work:

WITH result AS (
    SELECT Domain,
           Ip,
           ROW_NUMBER() OVER (PARTITION BY p.Domain 
                                 ORDER BY p.Ip) AS rk
      FROM DomainsTable p)
SELECT r.Domain, r.Ip
  FROM result r
WHERE r.rk = 1
Nisse Engström
  • 4,738
  • 23
  • 27
  • 42