What I'm experimenting here is how DELETE statements perform on a very simple example. I'm currently using SQL Server 2017 (I also tried with SQL Server 2014 and the results were similar).
I have two tables: Parent
and Child
. Child
has a foreign key to Parent (Parent_ID)
.
Parent:
Parent_ID Name
-----------------------
1 P1
2 P2
Child:
Child_ID Parent_ID Data
-----------------------------
1 1 P1C1
2 2 P2C1
3 2 PPPPCCCC
4 2 P2C1
5 2 PPPPCCCC
(around 4 million more rows with Parent_ID=2)
I always thought that adding an index on the foreign key (Parent_ID
in Child
here) was a good idea. But today, I have tried the behavior of DELETE in a somewhat extreme case - but I'm sure this kind of case could happen in real life - (4 millions rows with Parent_ID=2 in the Child table, only one row for Parent_ID=1).
If I try to delete rows with Parent_ID = 1, it looks good: it is fast enough, the index is used, the amount of logical reads seems to be fine (12 logical reads: I am no expert and don't know if it's really OK for such small amount of data).
Now here is what I don't understand (and don't like):
I try to delete all records in Child where Parent_ID=2:
BEGIN TRAN
DELETE FROM child
WHERE parent_id = 2
ROLLBACK TRAN
The IO statistics show this (for the DELETE):
Table 'Child'. Scan count 1, logical reads 38486782, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
38486782 logical reads... isn't that huge? I have tried to update statistics to be sure.
UPDATE STATISTICS Child WITH FULLSCAN
Then ran my query again => same results. May be the problem is the Index Delete on IX_Child_Parent_ID?
After disabling the index on the foreign key, things went much better:
Table 'Child'. Scan count 1, logical reads 202233, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Note: SQL Server suggests to create an index for the FK.
202233 logical reads sounds much better... at least for the specific case of Parent_Id=2.
The question is: why is SQL Server using the index and did not choose the clustered index scan method when it knows there are about 4 000 000 rows for Parent_ID = 2? Or may be it doesn't know? Aren't the statistics supposed to "help" SQL Server to know this kind of information?
I'm probably missing something.
(I have double checked and the statistics are - seemed to be - OK after the index is created: