1

I've an 'orders' table structure like this which contains 100,000 records:

date           orderid   type      productsales  other
01-Aug-2014    11        order     118           10.12    
01-Aug-2014    11        order     118           10.12
18-Aug-2014    11        order     35            4.21
22-Aug-2014    11        Refund    -35           -4.21
09-Sep-2014    12        order     56            7.29
15-Sep-2014    12        refund    -56           -7.29
23-Oct-2014    13        order     25            2.32
26-Oct-2014    13        refund    -25           -2.32

Now, what I want to achieve is to delete those duplicate row from my table where the orderid, type, productsales and other columns values are same to each other and keep only one row (look at the first two records for the orderid of 11).

But if the 'orderids' are same for the two records of the same 'type' of order, but the 'productsales' and 'other' columns values are different then don't delete those rows. I hope I clarified my point.

I'm looking for a mysql delete query to perform this task.

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Faizan Afzal
  • 413
  • 2
  • 11
  • 19
  • What if date is different? Do you want to delete one of them? Which one do you want to keep, the oldest or the newest? – David P Feb 13 '15 at 12:56
  • I don't think `DELETE` will work here. Better/easier approach would be to `SELECT DISTINCT` the respective rows into a temp table, delete the old table and then rename the temp table accordingly. – Sirko Feb 13 '15 at 12:56
  • `DELETE ... LIMIT 1` should do the trick: http://dev.mysql.com/doc/refman/5.0/en/delete.html – Bernhard Döbler Feb 13 '15 at 12:57
  • @DavidP date will also be the same for both the rows – Faizan Afzal Feb 13 '15 at 12:58
  • 2
    If the rows are completely equal, a delete won't work. Either use a cursor, or follow Sirko's temporary table advise. Next time, use primary keys and other constraints to avoid this mess. – jarlh Feb 13 '15 at 13:00
  • @jarlh actually, I didn't design the table structure like this, I'm fixing an already built application and this is the task which I needed to perform to remove the duplicates – Faizan Afzal Feb 13 '15 at 13:02

5 Answers5

1

You should add an id column. If you don't want to use a temp table, you could probably do something like this (I have NOT tested this, so...):

ALTER TABLE 'orders'
ADD COLUMN 'id' INT NOT NULL AUTO_INCREMENT FIRST, ADD PRIMARY KEY Id(id)

DELETE 
FROM orders INNER JOIN 
(
   SELECT TOP 1 id
   FROM orders
   WHERE COUNT(DISTINCT date,orderid,type.productsales,other) > 1
) dupes
ON orders.id = dupes.id
David P
  • 2,027
  • 3
  • 15
  • 27
1

May be its duplicate question to this: MySql: remove table rows depending on column duplicate values?

You can seek for the answer there.

The solution there specify that adding unique index on your possible duplicate columns with IGNORE keyword will remove all duplicates row.

ALTER IGNORE TABLE `table` ADD UNIQUE INDEX `name` (`col1`, `col2`, `col3`);

Here I also want to mention some points:

  1. unique index does not make change in row if any columns(from index, like here 3 columns) have null as value. Ex: null,1,"asdsa" can be stored twice
  2. same way if you have single column in unique index then multiple rows with null values(for that column) will remains in table
  3. IGNORE keywords id depreciated now, it will not work after MySQL 5.6(may be). Now only option is to create new table by a query like this:

CREATE TABLE <table_name> AS SELECT * FROM <your_table> GROUP BY col1,col2,col3;

After that you can delete <your_table> and rename <table_name> to your table.

Here you can change the column list in Group By clause according to your need(from all columns to one column, or few columns which have duplicate values together).

The plus point is, it will work with null values also.

Community
  • 1
  • 1
Adarsh Rajput
  • 1,246
  • 14
  • 24
0

A really easy way to do this is to add a UNIQUE index on the 3 columns. When you write the ALTER statement, include the IGNORE keyword. Like so:

ALTER IGNORE TABLE orders ADD UNIQUE INDEX idx_name (orderid, type, productsales, other);

This will drop all the duplicate rows. As an added benefit, future INSERTs that are duplicates will error out. As always, you may want to take a backup before running something like this...

I hope this can help you.

Ankit Bajpai
  • 13,128
  • 4
  • 25
  • 40
0

try this. create temp table such as temp and stored unique data,

SELECT distinct * into temp FROM Orders

then delete records of orders table table as

 DELETE FROM orders

after deleted all records insert records temp into records.

INSER into RECORDS SELECT * FROM TEMP DROP TABLE TEMP 
A. S. Mahadik
  • 585
  • 1
  • 5
  • 17
0

If you have completely duplicated rows, and you want to do this in SQL, then perhaps the best method is to save the rows you want in a temporary table, truncate the table, and insert the data back in:

create temporary table temp_orders as
    select distinct *
    from orders;

truncate table orders;

alter table orders add orderid int not null primary key auto_increment;

insert into orders;
    select *
    from temp_orders;

Oh, look, I also added an auto-incrementing primary key so you won't have this problem in the future. This would be a simpler process if you have a unique key on each row.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786