0

I am converting our project database from SQL Server to MySQL, the DB conversion has done already.

We have code as below to identify duplicate records based on hashcode and update them as duplicate.

Rank function in MySQL ([Rank function in MySQL) need rank based on age which will start with 1 and increment by 1 for each record. But for me Rank for each hascode should start from 1 and increment by 1 for same hascode, if new hascode comes Rank should start from 1.

update table set Duplicate=1
WHERE id IN 
( SELECT id FROM (
 select  RANK() OVER (PARTITION BY Hashcode ORDER BY  Date asc) R,*
 from table )A where R!=1 )

Below is table structure

CREATE TABLE TBL (
id int(11) NOT NULL AUTO_INCREMENT,
FileName varchar(100) DEFAULT NULL,
date datetime DEFAULT NULL,
hashcode varchar(255) DEFAULT NULL,
FileSize varchar(25) DEFAULT NULL,
IsDuplicate bit(1) DEFAULT NULL,
IsActive bit(1) DEFAULT NULL
PRIMARY KEY (`id`)
)

Please help me to migrate this code to MYSQL.

Community
  • 1
  • 1
Raghunath
  • 594
  • 4
  • 12
  • 27

2 Answers2

1

You don't need to use enumeration for this logic. You just want to set the duplicate flag on everything that is not the minimum date for the hashcode:

update table t join
       (select hashcode, min(date) as mindate
        from table t
        group by hashcode
       ) tt
       on t.hashcode = tt.hashcode and t.date > tt.mindate
    set t.Duplicate = 1;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
0

MySQL features a rather unique way to delete duplicates:

alter ignore table YourTable 
    add unique index ux_yourtable_hashcode (hashcode);

The trick here is in the ignore option:

If IGNORE is specified, only one row is used of rows with duplicates on a unique key. The other conflicting rows are deleted.

But there are also other ways. Based on your comment, there is an auto_increment column called id. Since this column is unique and not null, you can use it to distinguish duplicates. You'd need a temporary table to work around the cant specify target table TBL for update in FROM clause error:

create temporary table tmp_originals (id int);

insert  tmp_originals
        (id)
select  min(id)
from    YourTable
group by
        hashcode;

update  YourTable
set     Duplicate = 1
where   id not in (select id from tmp_originals);

The group by query selects the lowest id per group of rows with the same hashcode.

Andomar
  • 232,371
  • 49
  • 380
  • 404
  • I am getting Error Code: 1093. You cant specify target table TBL for update in FROM clause – Raghunath Jul 07 '15 at 12:55
  • Right, looks like [you need a workaround](http://www.codeproject.com/Tips/831164/MySQL-can-t-specify-target-table-for-update-in-FRO). I've updated the answer with a temporary table approach. – Andomar Jul 07 '15 at 13:12