remove duplicate data from sql

Question

I have a sql database named "data" and a table "disk", where there are 5 columns

CREATE TABLE `disk` (
  `id` int(11) NOT NULL,
  `title` text COLLATE utf8_unicode_ci NOT NULL,
  `link` text COLLATE utf8_unicode_ci NOT NULL,
  `mag` text COLLATE utf8_unicode_ci NOT NULL,
  `size` varchar(10) COLLATE utf8_unicode_ci NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

the "mag" column has a some of the duplicates.

and I want to delete the complete row where mag column is same.

note: let's say mag column has 1,2,3,4,4,5.... I want to delete a single 4 from it which is duplicate. means I don't want to completely delete both the 4. one "4" must be kept.

What should the query for it look like?

Consider to include sql for DDL and sample data in your questions, so it will help other members give you answer faster and more precise — Alexander Volok, Apr 20 '18 at 09:35

score 6 · Answer 1 · edited Apr 20 '18 at 10:52

6

try this below to delete duplicate with same d column and keep one row with lowest id value: :

DELETE d1 
FROM  disk d1, disk d2 
WHERE d1.id > d2.id AND 
      d1.d = d2.d;

edited Apr 20 '18 at 10:52

Yogesh Sharma

49,870
5
26
52

answered Apr 20 '18 at 09:33

Klodian

633
6
18

Nishant Gupta · Answer 2 · 2018-04-23T11:00:23.127

1

Try this:

DELETE disk
FROM disk
INNER JOIN (
       SELECT id,
               d,
               CASE WHEN d = @prevd 
                       THEN @id:=@id+1
                    ELSE @id:=1
               END AS rankNum,
               @prevd:=d AS prd
       FROM disk, (SELECT @prevd:=NULL,@id:=NULL) t
      ) t1
ON disk.id = t1.id
WHERE rankNum >= 2;

For Demo Follow the below link:

https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=318e94a135853fcd15b14e4b8bbf1fdc

edited Apr 23 '18 at 11:00

answered Apr 20 '18 at 09:32

Nishant Gupta

3,533
1
11
18

d is a text field and second thing is it compiled without error but it didnt remove any row... 0 rows affected – user9673975 Apr 20 '18 at 11:52
CREATE TABLE `disk` ( `id` int(11) NOT NULL, `title` text COLLATE utf8_unicode_ci NOT NULL, `link` text COLLATE utf8_unicode_ci NOT NULL, `mag` text COLLATE utf8_unicode_ci NOT NULL, `size` varchar(10) COLLATE utf8_unicode_ci NOT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci; this my actual table – user9673975 Apr 20 '18 at 12:00
I edited my answer try it again and show me how it's not working. @user9673975 – Nishant Gupta Apr 23 '18 at 11:01

Ankit Agrawal · Answer 3 · 2018-04-20T09:38:47.450

0

delete from disk
where id  in 
(
select id 
from task
group by id 
having count(id) >1
)

edited Apr 20 '18 at 09:38

answered Apr 20 '18 at 09:36

Ankit Agrawal

2,426
1
13
27

1

Will not delete any rows at all. – jarlh Apr 20 '18 at 09:38
Thank you for this code snippet, which might provide some limited short-term help. A proper explanation [would greatly improve](//meta.stackexchange.com/q/114762) its long-term value by showing *why* this is a good solution to the problem, and would make it more useful to future readers with other, similar questions. Please [edit] your answer to add some explanation, including the assumptions you've made. – Toby Speight Apr 20 '18 at 10:31

DEarTh · Accepted Answer · 2018-04-21T04:59:04.870

You can do the following..
Creating new table and keeping random row :

first copy table disk(unique data) to temp table disk2.
drop table disk.

rename temp table disk2 to disk.

create table disk2 select * from disk group by d;

drop table disk;

rename table disk2 to disk;

NOTE : Here we using group by with * because OP does not care which row to keep.

Creating new table and keeping row with min or max id : Another way to do this while keeping row with min or max id

/*copy data from disk to temp table disk2*/
create table disk2 select * from disk
    where id in (select min(id) from disk group by d);
/*drop table disk*/
drop table disk;
/*rename temp table to disk*/
rename table disk2 to disk;

UPDATE: Another way to do this
Deleting duplicates from existing table

    /*first create a dups table for duplicates*/
    create table dups select * from disk
        where id not in (select min(id) from disk group by d);
    /*now delete all rows which are present in dups table*/
    delete from disk where id in (select id from dups);
    /*now delete the dups table*/
    drop table dups;

it worked well but still has a problem... my server does not allow me to make more than 5000 changes in the database in an hour. so creating a new table with around 10000 rows and then renaming the table, would not work... — user9673975, Apr 20 '18 at 11:57

score 0 · Answer 5 · answered Apr 20 '18 at 12:33

No need to create any temporary tables

I hope this will work for you

DELETE ColumnName
FROM TableName
INNER JOIN 
(
    SELECT  MAX(ID) AS ID
    FROM TableName
    GROUP BY ID
    HAVING COUNT(*) > 1
) Duplicate on Duplicate.ID = TableName.ID
WHERE TableName.ID < Duplicate.lastId;

Please also check the following link your for more suggestions

MySQL delete duplicate records but keep latest

remove duplicate data from sql

5 Answers5

Linked