SQL: Deleting row which values already exist

Question

I have a table that look like this:

ID | DATE       | NAME   | VALUE_1 | VALUE_2
1  | 27.11.2015 | Homer  | A       | B
2  | 27.11.2015 | Bart   | C       | B
3  | 28.11.2015 | Homer  | A       | C
4  | 28.11.2015 | Maggie | C       | B
5  | 28.11.2015 | Bart   | C       | B

I currently delete duplicate rows (thank to this thread) using this code :

WITH cte AS
(SELECT ROW_NUMBER() OVER (PARTITION BY [VALUE_1], [VALUE_2]
ORDER BY [DATE] DESC) RN
FROM [MY_TABLE])
DELETE FROM cte
WHERE RN > 1

But this code don't delete exactly the lines I want. I would like to delete only rows which values already exist so in my example I would like to delete only line 5 because line 2 have the same values and is older.

Code to create my table and insert values:

CREATE TABLE [t_diff_values]
([id] INT IDENTITY NOT NULL PRIMARY KEY,
[date] DATETIME NOT NULL,
[name] VARCHAR(255) NOT NULL DEFAULT '',
[val1] CHAR(1) NOT NULL DEFAULT '',
[val2] CHAR(1) NOT NULL DEFAULT '');

INSERT INTO [t_diff_values] ([date], [name], [val1], [val2]) VALUES
('2015-11-27','Homer',  'A','B'),
('2015-11-27','Bart',   'C','B'),
('2015-11-28','Homer',  'A','C'),
('2015-11-28','Maggie', 'C','B'),
('2015-11-28','Bart',   'C','B');

Giorgi Nakeuri · Accepted Answer · 2015-11-30T10:14:06.653

3

You need to add one more CTE where you will index all islands and then apply your duplicate logic in second CTE:

DECLARE @t TABLE
    (
      ID INT ,
      DATE DATE ,
      VALUE_1 CHAR(1) ,
      VALUE_2 CHAR(1)
    )

INSERT  INTO @t
VALUES  ( 1, '20151127', 'A', 'B' ),
        ( 2, '20151128', 'C', 'B' ),
        ( 3, '20151129', 'A', 'B' ),
        ( 4, '20151130', 'A', 'B' );
WITH    cte1
          AS ( SELECT   * ,
                        ROW_NUMBER() OVER ( ORDER BY date)
                        - ROW_NUMBER() OVER ( PARTITION BY VALUE_1, VALUE_2 ORDER BY DATE) AS gr
               FROM     @t
             ),
        cte2
          AS ( SELECT   * ,
                        ROW_NUMBER() OVER ( PARTITION BY VALUE_1, VALUE_2, gr ORDER BY date) AS rn
               FROM     cte1
             )
    DELETE  FROM cte2
    WHERE   rn > 1

SELECT  *
FROM    @t

edited Nov 30 '15 at 10:14

answered Nov 30 '15 at 09:07

Giorgi Nakeuri

35,155
8
47
75

I made a mistake, in my initial example I want to delete line 4 but not line 3. Can you please update the code while I change my question :))) – JuGuSm Nov 30 '15 at 10:12
@JuGuSm, I have removed `DESC` in orderings. – Giorgi Nakeuri Nov 30 '15 at 10:14
I'm sorry but when I add the name column it change everything and I can't adapt your code :/ – JuGuSm Nov 30 '15 at 11:25
@JuGuSm, I will have a look. – Giorgi Nakeuri Nov 30 '15 at 11:35
@JuGuSm, change `PARTITION BY VALUE_1, VALUE_2` to `PARTITION BY [name], [val1], [val2]` in two places. Is this helpful? If not describe why? – Giorgi Nakeuri Nov 30 '15 at 11:46
This is what I tried but it doesn't delete any lines :/ rn=1 for every lines and the code is DELETE FROM cte2 WHERE rn > 1 – JuGuSm Nov 30 '15 at 12:12
@JuGuSm, in your sample all rows are distinct, so nothing will be deleted. – Giorgi Nakeuri Nov 30 '15 at 12:16
lines 2 and 5 are the same exept the date but this is the point: I would like to keep the last statut ; the line 5 (28.11.2015) is useless because the value didn't change since the 27.11.2015 – JuGuSm Nov 30 '15 at 14:41
@JuGuSm, initial question was different. Then I think you can use your own code `WITH cte AS (SELECT ROW_NUMBER() OVER (PARTITION BY [Name], [VALUE_1], [VALUE_2] ORDER BY [DATE]) RN FROM [MY_TABLE]) DELETE FROM cte WHERE RN > 1` – Giorgi Nakeuri Nov 30 '15 at 14:50

Julien Vavasseur · Answer 2 · 2015-11-30T10:07:20.650

1

You could use this query:

WITH cte AS
(
    SELECT RN = ROW_NUMBER() OVER (ORDER BY ID) 
    , *
    FROM @data
)
DELETE FROM c1
--SELECT * 
FROM CTE c1
INNER JOIN CTE c2 ON c1.RN +1 = c2.RN AND c1.VALUE_1 = c2.VALUE_1 AND c1.VALUE_2 = c2.VALUE_2

Here I order them by ID. If the next one (RN+1) has similar V1 and V2, it is deleted.

Output:

ID  DATE        VALUE_1 VALUE_2
1   2015-11-27  A       B
2   2015-11-28  C       B
4   2015-11-30  A       B

Data:

declare @data table(ID int, [DATE] date, VALUE_1 char(1), VALUE_2 char(1));
insert into @data(ID, [DATE], VALUE_1, VALUE_2) values
(1, '20151127', 'A', 'B'), 
(2, '20151128', 'C', 'B'), 
(3, '20151129', 'A', 'B'), 
(4, '20151130', 'A', 'B');

edited Nov 30 '15 at 10:07

answered Nov 30 '15 at 09:09

Julien Vavasseur

3,854
1
20
29

I tried your code and I get this error : Msg 207, Level 16, State 1, Line 17 Invalid column name 'VALUE_1'. Msg 207, Level 16, State 1, Line 17 Invalid column name 'VALUE_1'. Msg 207, Level 16, State 1, Line 17 Invalid column name 'VALUE_2'. Msg 207, Level 16, State 1, Line 17 Invalid column name 'VALUE_2'. – JuGuSm Nov 30 '15 at 09:32
I have tried this and it seems to work to. The code is a little shorter and clearer, but performs slightly worse then mine. – Giorgi Nakeuri Nov 30 '15 at 10:22
I haven't check performances. Mine may need to be tweaked if the data requirements are not what they seem to be from the question. Yours seems more versatile. – Julien Vavasseur Nov 30 '15 at 10:27

score 1 · Answer 3 · answered Nov 30 '15 at 09:10

1

Try this

CREATE TABLE [dbo].[Employee](
    [ID] INT NOT NULL,
    [Date] DateTime NOT NULL,
    [VAL1] varchar(20) NOT NULL,
    [VAL2] varchar(20) NOT NULL
)

INSERT INTO [dbo].[Employee] VALUES 
            (1,'2015-11-27 10:44:33.087','A','B')
INSERT INTO [dbo].[Employee] VALUES
            (2,'2015-11-28 10:44:33.087','C','B')
INSERT INTO [dbo].[Employee] VALUES
            (3,'2015-11-29 10:44:33.087','A','B') 
INSERT INTO [dbo].[Employee] VALUES
            (4,'2015-11-30 10:44:33.087','A','B')

with cte as(
    select
        *,
        rn = row_number() over(partition by [VAL1], [VAL2]
ORDER BY [DATE] DESC),
        cc = count(*) over(partition by [VAL1], [VAL2])
    from [Employee]
)

delete
from cte
where
    rn > 1 and rn < cc

select * from [Employee]

answered Nov 30 '15 at 09:10

Sabyasachi Mishra

1,677
2
31
49

2

@JuGuSm, You just accepted incorrect solution! Just try it on `VALUES ( 1, '20151126', 'A', 'B' ), ( 2, '20151127', 'C', 'B' ), ( 3, '20151128', 'A', 'B' ), ( 4, '20151129', 'D', 'B' ), ( 5, '20151130', 'A', 'B' );` All rows are different but this solution deletes rows. – Giorgi Nakeuri Nov 30 '15 at 09:38
@SabyasachiMishra, ordering will not correct the situation. Also ordering in Count is supported only in 2012+ versions – Giorgi Nakeuri Nov 30 '15 at 09:48
@GiorgiNakeuri you're right! I tried and it delete a line while it should not because there isn't two identicals and followed lines. I try your solution with same values. – JuGuSm Nov 30 '15 at 10:01

SQL: Deleting row which values already exist

3 Answers3