4

As the title says, I've been running this command to delete some records from a table but I noticed that it wasn't deleting the same records listed if I ran a query selecting the TOP(X) records right before.

Shouldn't the DELETE command follow the same default ordering that SELECT does? If not, why?

I don't know if it matters, but here is the delete command I'm running:

DELETE TOP (100000) t
FROM Ticket AS t
    LEFT JOIN Series s on t.SeriesId = s.id
    LEFT JOIN Payment p on t.id = p.TicketId
WHERE s.EndDate <= DATEADD(MONTH, -1, GETDATE()) AND t.ExportDate is null AND p.TicketId is null

For the select, just replace the first line with SELECT TOP(100000) t.*

Henrique Miranda
  • 1,030
  • 1
  • 13
  • 31
  • 5
    Nope. There is no "default ordering". If you don't specify an `ORDER BY` any ordering is valid. The way to specify an ordering for a delete would be to put a select into a CTE with a deterministic `order by` and then delete from the CTE – Martin Smith Mar 11 '19 at 17:55
  • Records in a table are stored to disk in a way that optimizes their storage and their retrieval. That's it. It doesn't care about order of records unless you do by specifying an ORDER BY clause. – JNevill Mar 11 '19 at 17:57
  • 2
    Why `LEFT JOIN Series s`? The `WHERE s.EndDate` makes it an inner join anyway – Martin Smith Mar 11 '19 at 17:57
  • 2
    Rule Number one: If you're going to use "TOP", always use "ORDER BY". Always. – pmbAustin Mar 11 '19 at 18:09

2 Answers2

13

Without an ORDER BY, the order of the data CANNOT be guaranteed.

S3S
  • 24,809
  • 5
  • 26
  • 45
UnhandledExcepSean
  • 12,504
  • 2
  • 35
  • 51
3

Nope. There is no "default ordering". If you don't specify an ORDER BY any ordering is valid.

The resulting order will be execution plan dependant and may not even be stable within a given execution plan (e.g. parallelism may distribute rows to different threads with the exact distribution depending on how busy each scheduler was at the time)

Moreover even ignoring the ORDER BY issue the semantics of the TOP are different for the SELECT and DELETE. If there are multiple matches in Series for a given Ticket (and matching the EndDate) the SELECT will count the joined rows whereas the DELETE will collapse these down to 1 before the delete and just count the rows deleted from Ticket.

To do the DELETE in a deterministic manner you can first construct a SELECT statement that is valid for this purpose (with all joins replaced with EXISTS/NON EXISTS) and then you can put that in a CTE and delete from that.

WITH CTE
     AS (SELECT TOP (100000) t.*
         FROM   Ticket AS t
         WHERE  t.ExportDate IS NULL
                AND EXISTS (SELECT *
                            FROM   Series s
                            WHERE  t.SeriesId = s.id
                                   AND s.EndDate <= DATEADD(MONTH, -1, GETDATE()))
                AND NOT EXISTS(SELECT *
                               FROM   Payment p
                               WHERE  t.id = p.TicketId)
         ORDER  BY t.id)
DELETE FROM CTE 
Martin Smith
  • 438,706
  • 87
  • 741
  • 845
  • The only reason I'm asking this is because I'm receiving exactly the same results for ``SELECT TOP(X)``, no matter how many times I run it. – Henrique Miranda Mar 11 '19 at 18:46
  • This will be execution plan dependant. If you get a non parallel execution plan and there are no data modifications going on that will move rows around or cause stats to update and the plan to change then yes it is quite likely this will happen but you shouldn't rely on any order you observe being reproduced for that statement, let alone when you use it in a different context like changing the `SELECT` to a `DELETE` – Martin Smith Mar 11 '19 at 18:50