4

I have a table with the following schema:

ID NVARCHAR(10)
Message NVARCHAR(300)
UpdateTime DATETIME

where ID is a foreign key. New message records are added in clusters with the same UpdateTime. An example would be (sorted by ID and UpdateTime):

ID   | Status        | UpdateTime
---------------------------------
42     Cluster1-Msg1   2012-12-25
42     Cluster1-Msg2   2012-12-25
42     Cluster2-Msg1   2013-10-10
42     Cluster2-Msg2   2013-10-10
43     Cluster4-Msg1   2011-11-27

Here ID #42 is associated with 4 messages, clustered in two groups on different dates, while ID #43 is only associated with one message.

From time to time I wish to purge this table by, for each group with the same ID, deleting all message records whose UpdateTime is less than the maximum within the group. The end result in the example above would be:

42     Cluster2-Msg1   2013-10-10
42     Cluster2-Msg2   2013-10-10
43     Cluster4-Msg1   2011-11-27

The following SQL query locates all the records I want to delete:

SELECT Msgs.ID, Msgs.UpdateTime
FROM Messages Msgs
JOIN
(SELECT ID, MAX(UpdateTime) AS MaxTime FROM Messages GROUP BY ID) MaxTimes
ON Msgs.ID = MaxDates.ID
WHERE Msgs.UpdateTime < MaxTimes.MaxTime

Now I wish to write a DELETE statement that removes the items that match those returned by the query above. The records must be removed based on the ID and UpdateTime values. I'm just not understanding how to express this in Transact-SQL.

John Källén
  • 7,551
  • 31
  • 64
  • What if you have a duplicate? You want to keep both or just one? – paparazzo Oct 10 '13 at 12:53
  • Think of the table as containing groups of messages (group key is the ID column). I want to remove any "old" messages, that is messages that have dates less the maximum date within each group. – John Källén Oct 10 '13 at 12:54
  • You can also use `ROW_NUMBER` for this. – Martin Smith Oct 10 '13 at 12:55
  • Here I'm looking to delete records that don't match certain criteria (older than the maximum date), rather than deleting duplicates. It so happens that the accepted answer to that other question is implemented in a similar way. – John Källén Oct 10 '13 at 13:02
  • 1
    @JohnKällén - No it is the same thing. Where a duplicate row is defined as having the same value for `ID`. In the linked question a duplicate is defined as having the same value for `Col1,Col2,Col3` but potentialy different in other columns. – Martin Smith Oct 10 '13 at 13:09
  • So the `ROW_NUMBER` version is `;WITH cte AS (SELECT ROW_NUMBER() OVER (PARTITION BY ID ORDER BY UpdateTime DESC) AS RN FROM Msgs) DELETE FROM cte WHERE RN > 1` – Martin Smith Oct 10 '13 at 13:15

2 Answers2

4
DELETE Msgs 
FROM Messages Msgs
JOIN (SELECT ID, MAX(UpdateTime) AS MaxTime FROM Messages GROUP BY ID) MaxTimes
    ON Msgs.ID = MaxDates.ID
WHERE Msgs.UpdateTime < MaxTimes.MaxTime

Note that all I did was copy your original SELECT statement, and change the first line of code from SELECT ...to DELETE Msgs. You have to specify the table name, as there are more than one table expression in the FROM part of your query.

Dan
  • 10,480
  • 23
  • 49
1
DELETE  DUB
FROM    Messages AS DUB
WHERE   EXISTS (
            SELECT  SUB.ID
            FROM    Messages AS SUB
            WHERE   SUB.ID = DUB.ID
            GROUP BY SUB.ID
            HAVING MAX(SUB.UpdateTime) > DUB.UpdateTime
        )

I find this kind of correlated sub-query to be more readable.

David Korb
  • 31
  • 3