How to delete only the reduntant data?

Question

If I have table like this:

ID      FROM_DATE           TO_DATE
18177   2016-04-20 00:00    2016-04-20 00:00
18177   2016-04-20 00:00    2016-04-20 00:00
18177   2016-04-23 00:00    2016-04-23 00:00
18177   2016-04-23 00:00    2016-04-23 00:00
18177   2016-04-24 00:00    2016-04-24 00:00
18177   2016-04-24 00:00    2016-04-24 00:00
18177   2016-04-26 00:00    2016-04-26 00:00
18177   2016-04-26 00:00    2016-04-26 00:00
18177   2016-04-27 00:00    2016-04-27 00:00
18177   2016-04-27 00:00    2016-04-27 00:00
18177   2016-04-30 00:00    2016-04-30 00:00
18177   2016-04-30 00:00    2016-04-30 00:00

This table has no key, and I found my self need to set primary key composed of the three columns, but I can't do this because the data duplication.

How to delete the redundant data? For example, if I have two records with the same data, I want to keep only one record?

I want the result like this to fix the problem and set the key:

ID      FROM_DATE           TO_DATE
18177   2016-04-20 00:00    2016-04-20 00:00
18177   2016-04-23 00:00    2016-04-23 00:00
18177   2016-04-24 00:00    2016-04-24 00:00
18177   2016-04-26 00:00    2016-04-26 00:00
18177   2016-04-27 00:00    2016-04-27 00:00
18177   2016-04-30 00:00    2016-04-30 00:00

Create a copy table, insert into copy select distinct * from originaltable. Rename — jarlh, Apr 26 '16 at 11:48
Have a read of this: http://stackoverflow.com/questions/18932/how-can-i-remove-duplicate-rows?rq=1 — StevieG, Apr 26 '16 at 14:05

score 3 · Accepted Answer · edited Apr 26 '16 at 14:02

3

Does this work?

delete from t
    where rowid not in (select min(rowid)
                        from t
                        group by id, from_date, to_date
                       );

By default, Informix tables have a built-in rowid column. If this version does work, you can put the subquery into a temporary table and use that instead.

Also note: It is often faster to do it this way:

select distinct id, from_date, to_date
from t
into temp temp_t;

truncate table t;

commit;  -- If there was a transaction active

insert into t(id, from_date, to_date)
    select id, from_date, to_date
    from temp_t;

This can be faster because of logging and locking considerations with massive deletes.

edited Apr 26 '16 at 14:02

Jonathan Leffler

730,956
141
904
1,278

answered Apr 26 '16 at 12:04

Gordon Linoff

1,242,037
58
646
786

i try to select firstly like this : `select * from t where rowid not in (select min(rowid) from t group by id, from_date, to_date );` when i use unique i get less rows !! i have more columns btw – Anyname Donotcare Apr 26 '16 at 12:23
@anyname-donotcare and these other columns have different values for each row? Is it possible that for the same `id`, `from_date`and `to_date` the other columns can be different across rows? If so, how would you choose which row to keep? – Ricardo Henriques Apr 26 '16 at 13:11
@anyname-donotcare Can you post the: complete schema of the table in question; result of `SELECT COUNT(*) FROM t`; result of `SELECT COUNT(*) FROM t GROUP BY id, from_date, to_date`; result of `SELECT COUNT(*) FROM (SELECT UNIQUE * FROM t)`. – Ricardo Henriques Apr 26 '16 at 13:12
Now it works fine , Could i know how to do the same in sql server please – Anyname Donotcare Apr 26 '16 at 13:45

score 1 · Answer 2 · edited Apr 26 '16 at 14:30

DELETE FROM Table_Name
WHERE ROWID NOT IN (SELECT MAX(ROWID) FROM Table_Name
                    GROUP BY Id,From_Date,To_Date
                   );

First it will execute the inner Query and it will create different groups depending on the Id,From_Date & To_Date combined. In this way all the duplicate records belongs to same group only and then filter the distinct records by selecting only the maximum rowid among each group. Then outer query will execute and it will delete the duplicate records.

ROWID is unique for each records inserted into the database i.e provided by DBMS itself internally.

How to delete only the reduntant data?

2 Answers2