1

I'm trying to delete some rows from my Mysql table, when one key is duplicate (here "url") and keep a particular one (here the smallest "key1" and "key2")

Example :

Table t1
Id    Url    Key1    Key2
1     A.com   10      10
2     B.com   20      25
3     B.com   21      25
4     C.com   35      35
5     C.com   35      37

Desired output is :

Table t1
Id    Url    Key1    Key2
1     A.com   10      10
3     B.com   21      25
5     C.com   35      37

So the query (if it exists) should look like :

  • Select rows where Url are duplicate
  • Then sort by Key1 and remove the row where Key1 is strictly inferior
  • if Key1 are equal, remove the row where Key2 is inferior

Thanks

Vincent
  • 1,534
  • 3
  • 20
  • 42
  • possible duplicate of [How to delete duplicates on a MySQL table?](http://stackoverflow.com/questions/2630440/how-to-delete-duplicates-on-a-mysql-table) – Norbert Sep 20 '15 at 15:31

2 Answers2

1

You want to keep the rows where key1, key2 are maximal. An easy way to express this is:

delete t1 from table t1
    where exists (select 1
                  from t1 t11
                  where t11.url = t1.url and
                        (t11.key1 > t1.key1 or
                         t11.key1 = t1.key1 and t11.key2 > t1.key2
                        )
                 );

Alas, MySQL doesn't allow this construct, because you using the table being deleted. So, you can do something like this:

delete t1
    from t1 left join
         (select t.*,
                 (select max(key2)
                  from t1
                  where t1.url = t.url and t1.key = t.maxkey1
                 ) as maxkey2
          from (select url, max(key1) as maxkey1
                from t1
                group by url
               ) t
        ) t
        on t1.url = t.url and t1.key1 = t.maxkey1 and t2.key2 = t.maxkey2
    where t.url is null;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
0

I think this might be helpful

DELETE t1
  FROM t1 as tb1
  join t1 as tb2
 WHERE tb1.url= tb2.url
   and tb1.id < tb2.id

This way you keep the record with the max value on id column

but if you just want to fetch records

SELECT distinct tb1.*
  FROM t1 as tb1
  join t1 as tb2
 WHERE tb1.url= tb2.url
   and tb1.id < tb2.id
urfusion
  • 5,528
  • 5
  • 50
  • 87