0

I am trying to delete duplicate records but getting some error.

Can you please help me how to do this?

columns:

dataSetName, dataSetType, colPosition, Name, dataType, keyCol, createUser, createDate, updateUser, updateDate, isActive, discription, sourceFieldName

enter image description here

sql query :

WITH cte AS (
    SELECT 
        dataSetName,
        dataSetType,
        colPosition,
        Name,
        dataType,
        keyCol,
        createUser,
        createDate,
        updateUser,
        updateDate,
        isActive,
        discription,
        sourceFieldName,
        ROW_NUMBER() OVER (
            PARTITION BY 
                colPosition
            ORDER BY 
                colPosition
        ) row_num
     FROM 
        cde_prod_qa.transformation_source_metadata
)
DELETE FROM cte
WHERE row_num > 1;
shivi
  • 15
  • 1
  • 6

3 Answers3

0

You can use COUNT() with window function :

WITH cte AS (
    SELECT dataSetName, dataSetType, colPosition, Name, dataType,
           keyCol, createUser, createDate, updateUser,
           updateDate, isActive, discription, sourceFieldName,
           COUNT(*) OVER (PARTITION BY colPosition) CNT
     FROM cde_prod_qa.transformation_source_metadata
)
DELETE 
FROM cte
WHERE CNT > 1;
Yogesh Sharma
  • 49,870
  • 5
  • 26
  • 52
  • Error Code: 1064. You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'cte AS ( SELECT dataSetName, dataSetType, colPosition, Name, dataType, ' at line 1 – shivi Feb 05 '20 at 10:21
  • *You can use COUNT() with window function :* Not by this way - in it ALL duplicates will be dropped whereas OP needs all except one (it seems...). – Akina Feb 05 '20 at 10:28
  • @Akina i just copied your given query and ran over there. – shivi Feb 05 '20 at 10:28
  • @Akina so what is the solution then? – shivi Feb 05 '20 at 10:29
0

I don't think that MySQL supports updateable common table expressions (unlike other databses such as SQL Server or Postgres).

In absence of a primary key, the simplest method would be to copy the rows that you want to retain in a separate table using select distinct, then truncate the original table and copy back into it:

create table my_temp_table as
select distinct 
    dataSetName,
    dataSetType,
    colPosition,
    Name,
    dataType,
    keyCol,
    createUser,
    createDate,
    updateUser,
    updateDate,
    isActive,
    discription,
    sourceFieldName
from cde_prod_qa.transformation_source_metadata;

truncate cde_prod_qa.transformation_source_metadata;

insert into cde_prod_qa.transformation_source_metadata
select * from my_temp_table;

drop table my_temp_table;

Make sure that you backup your table before truncating it!

Once this is done, you should really consider creating a unique constraint on colPosition to prevent further duplicates to happen.

GMB
  • 216,147
  • 25
  • 84
  • 135
0

As I understand the duplicated records are FULL duplicates - i.e. the values in ALL fields are identical.

In such case the simplest solution depends on the amount of total records and duplicated records relation.

If the relation is high (the amount of duplicates is relatively low) then the simplest solution seems to be the next: create temporary table, copy all records which have more than one copy into it, delete all duplicates, copy them from temporary table. Demonstration. The problem - you must specify all fields list in GROUP BY clause.

If this relation is high enough (too many duplicates) then use @GMB's solution. Or its variation:

CREATE TEMPORARY TABLE my_temp_table LIKE cde_prod_qa.transformation_source_metadata;
ALTER TABLE cde_prod_qa.transformation_source_metadata ADD PRIMARY KEY (colPosition);
INSERT IGNORE INTO my_temp_table SELECT * FROM cde_prod_qa.transformation_source_metadata;
TRUNCATE cde_prod_qa.transformation_source_metadata;
INSERT INTO cde_prod_qa.transformation_source_metadata SELECT * FROM my_temp_table;

The possible problem (for both variants) - some triggers exists on this table. In such case you must delete them (in MySQL triggers cannot be disabled) and re-create after duplicates removing.

Akina
  • 39,301
  • 5
  • 14
  • 25