3

I have a table that look like this:

ID | DATE       | NAME   | VALUE_1 | VALUE_2
1  | 27.11.2015 | Homer  | A       | B
2  | 27.11.2015 | Bart   | C       | B
3  | 28.11.2015 | Homer  | A       | C
4  | 28.11.2015 | Maggie | C       | B
5  | 28.11.2015 | Bart   | C       | B

I currently delete duplicate rows (thank to this thread) using this code :

WITH cte AS
(SELECT ROW_NUMBER() OVER (PARTITION BY [VALUE_1], [VALUE_2]
ORDER BY [DATE] DESC) RN
FROM [MY_TABLE])
DELETE FROM cte
WHERE RN > 1

But this code don't delete exactly the lines I want. I would like to delete only rows which values already exist so in my example I would like to delete only line 5 because line 2 have the same values and is older.

Code to create my table and insert values:

CREATE TABLE [t_diff_values]
([id] INT IDENTITY NOT NULL PRIMARY KEY,
[date] DATETIME NOT NULL,
[name] VARCHAR(255) NOT NULL DEFAULT '',
[val1] CHAR(1) NOT NULL DEFAULT '',
[val2] CHAR(1) NOT NULL DEFAULT '');

INSERT INTO [t_diff_values] ([date], [name], [val1], [val2]) VALUES
('2015-11-27','Homer',  'A','B'),
('2015-11-27','Bart',   'C','B'),
('2015-11-28','Homer',  'A','C'),
('2015-11-28','Maggie', 'C','B'),
('2015-11-28','Bart',   'C','B');
Community
  • 1
  • 1
JuGuSm
  • 33
  • 7

3 Answers3

3

You need to add one more CTE where you will index all islands and then apply your duplicate logic in second CTE:

DECLARE @t TABLE
    (
      ID INT ,
      DATE DATE ,
      VALUE_1 CHAR(1) ,
      VALUE_2 CHAR(1)
    )

INSERT  INTO @t
VALUES  ( 1, '20151127', 'A', 'B' ),
        ( 2, '20151128', 'C', 'B' ),
        ( 3, '20151129', 'A', 'B' ),
        ( 4, '20151130', 'A', 'B' );
WITH    cte1
          AS ( SELECT   * ,
                        ROW_NUMBER() OVER ( ORDER BY date)
                        - ROW_NUMBER() OVER ( PARTITION BY VALUE_1, VALUE_2 ORDER BY DATE) AS gr
               FROM     @t
             ),
        cte2
          AS ( SELECT   * ,
                        ROW_NUMBER() OVER ( PARTITION BY VALUE_1, VALUE_2, gr ORDER BY date) AS rn
               FROM     cte1
             )
    DELETE  FROM cte2
    WHERE   rn > 1

SELECT  *
FROM    @t
Giorgi Nakeuri
  • 35,155
  • 8
  • 47
  • 75
  • I made a mistake, in my initial example I want to delete line 4 but not line 3. Can you please update the code while I change my question :))) – JuGuSm Nov 30 '15 at 10:12
  • @JuGuSm, I have removed `DESC` in orderings. – Giorgi Nakeuri Nov 30 '15 at 10:14
  • I'm sorry but when I add the name column it change everything and I can't adapt your code :/ – JuGuSm Nov 30 '15 at 11:25
  • @JuGuSm, I will have a look. – Giorgi Nakeuri Nov 30 '15 at 11:35
  • @JuGuSm, change `PARTITION BY VALUE_1, VALUE_2` to `PARTITION BY [name], [val1], [val2]` in two places. Is this helpful? If not describe why? – Giorgi Nakeuri Nov 30 '15 at 11:46
  • This is what I tried but it doesn't delete any lines :/ rn=1 for every lines and the code is DELETE FROM cte2 WHERE rn > 1 – JuGuSm Nov 30 '15 at 12:12
  • @JuGuSm, in your sample all rows are distinct, so nothing will be deleted. – Giorgi Nakeuri Nov 30 '15 at 12:16
  • lines 2 and 5 are the same exept the date but this is the point: I would like to keep the last statut ; the line 5 (28.11.2015) is useless because the value didn't change since the 27.11.2015 – JuGuSm Nov 30 '15 at 14:41
  • @JuGuSm, initial question was different. Then I think you can use your own code `WITH cte AS (SELECT ROW_NUMBER() OVER (PARTITION BY [Name], [VALUE_1], [VALUE_2] ORDER BY [DATE]) RN FROM [MY_TABLE]) DELETE FROM cte WHERE RN > 1` – Giorgi Nakeuri Nov 30 '15 at 14:50
1

You could use this query:

WITH cte AS
(
    SELECT RN = ROW_NUMBER() OVER (ORDER BY ID) 
    , *
    FROM @data
)
DELETE FROM c1
--SELECT * 
FROM CTE c1
INNER JOIN CTE c2 ON c1.RN +1 = c2.RN AND c1.VALUE_1 = c2.VALUE_1 AND c1.VALUE_2 = c2.VALUE_2

Here I order them by ID. If the next one (RN+1) has similar V1 and V2, it is deleted.

Output:

ID  DATE        VALUE_1 VALUE_2
1   2015-11-27  A       B
2   2015-11-28  C       B
4   2015-11-30  A       B

Data:

declare @data table(ID int, [DATE] date, VALUE_1 char(1), VALUE_2 char(1));
insert into @data(ID, [DATE], VALUE_1, VALUE_2) values
(1, '20151127', 'A', 'B'), 
(2, '20151128', 'C', 'B'), 
(3, '20151129', 'A', 'B'), 
(4, '20151130', 'A', 'B');
Julien Vavasseur
  • 3,854
  • 1
  • 20
  • 29
  • I tried your code and I get this error : Msg 207, Level 16, State 1, Line 17 Invalid column name 'VALUE_1'. Msg 207, Level 16, State 1, Line 17 Invalid column name 'VALUE_1'. Msg 207, Level 16, State 1, Line 17 Invalid column name 'VALUE_2'. Msg 207, Level 16, State 1, Line 17 Invalid column name 'VALUE_2'. – JuGuSm Nov 30 '15 at 09:32
  • I have tried this and it seems to work to. The code is a little shorter and clearer, but performs slightly worse then mine. – Giorgi Nakeuri Nov 30 '15 at 10:22
  • I haven't check performances. Mine may need to be tweaked if the data requirements are not what they seem to be from the question. Yours seems more versatile. – Julien Vavasseur Nov 30 '15 at 10:27
1

Try this

CREATE TABLE [dbo].[Employee](
    [ID] INT NOT NULL,
    [Date] DateTime NOT NULL,
    [VAL1] varchar(20) NOT NULL,
    [VAL2] varchar(20) NOT NULL
)

INSERT INTO [dbo].[Employee] VALUES 
            (1,'2015-11-27 10:44:33.087','A','B')
INSERT INTO [dbo].[Employee] VALUES
            (2,'2015-11-28 10:44:33.087','C','B')
INSERT INTO [dbo].[Employee] VALUES
            (3,'2015-11-29 10:44:33.087','A','B') 
INSERT INTO [dbo].[Employee] VALUES
            (4,'2015-11-30 10:44:33.087','A','B')

with cte as(
    select
        *,
        rn = row_number() over(partition by [VAL1], [VAL2]
ORDER BY [DATE] DESC),
        cc = count(*) over(partition by [VAL1], [VAL2])
    from [Employee]
)

delete
from cte
where
    rn > 1 and rn < cc

select * from [Employee]
Sabyasachi Mishra
  • 1,677
  • 2
  • 31
  • 49
  • 2
    @JuGuSm, You just accepted incorrect solution! Just try it on `VALUES ( 1, '20151126', 'A', 'B' ), ( 2, '20151127', 'C', 'B' ), ( 3, '20151128', 'A', 'B' ), ( 4, '20151129', 'D', 'B' ), ( 5, '20151130', 'A', 'B' );` All rows are different but this solution deletes rows. – Giorgi Nakeuri Nov 30 '15 at 09:38
  • @SabyasachiMishra, ordering will not correct the situation. Also ordering in Count is supported only in 2012+ versions – Giorgi Nakeuri Nov 30 '15 at 09:48
  • @GiorgiNakeuri you're right! I tried and it delete a line while it should not because there isn't two identicals and followed lines. I try your solution with same values. – JuGuSm Nov 30 '15 at 10:01