SQL Server: creating indexes on foreign keys

Question

What I'm experimenting here is how DELETE statements perform on a very simple example. I'm currently using SQL Server 2017 (I also tried with SQL Server 2014 and the results were similar).

I have two tables: Parent and Child. Child has a foreign key to Parent (Parent_ID).

Parent:

Parent_ID      Name
-----------------------
   1            P1
   2            P2

Child:

Child_ID    Parent_ID   Data
-----------------------------
  1         1           P1C1
  2         2           P2C1
  3         2           PPPPCCCC
  4         2           P2C1
  5         2           PPPPCCCC
  (around 4 million more rows with Parent_ID=2)

I always thought that adding an index on the foreign key (Parent_ID in Child here) was a good idea. But today, I have tried the behavior of DELETE in a somewhat extreme case - but I'm sure this kind of case could happen in real life - (4 millions rows with Parent_ID=2 in the Child table, only one row for Parent_ID=1).

If I try to delete rows with Parent_ID = 1, it looks good: it is fast enough, the index is used, the amount of logical reads seems to be fine (12 logical reads: I am no expert and don't know if it's really OK for such small amount of data).

Now here is what I don't understand (and don't like):

I try to delete all records in Child where Parent_ID=2:

BEGIN TRAN

    DELETE FROM child 
    WHERE parent_id = 2

ROLLBACK TRAN

The IO statistics show this (for the DELETE):

Table 'Child'. Scan count 1, logical reads 38486782, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

38486782 logical reads... isn't that huge? I have tried to update statistics to be sure.

    UPDATE STATISTICS Child WITH FULLSCAN

Then ran my query again => same results. May be the problem is the Index Delete on IX_Child_Parent_ID?

After disabling the index on the foreign key, things went much better:

Table 'Child'. Scan count 1, logical reads 202233, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Note: SQL Server suggests to create an index for the FK.

202233 logical reads sounds much better... at least for the specific case of Parent_Id=2.

The question is: why is SQL Server using the index and did not choose the clustered index scan method when it knows there are about 4 000 000 rows for Parent_ID = 2? Or may be it doesn't know? Aren't the statistics supposed to "help" SQL Server to know this kind of information?

I'm probably missing something.

(I have double checked and the statistics are - seemed to be - OK after the index is created:

(note: commenting myself) maybe it's a stupid case, and everything is normal... but I would really be happy to have some advice from gurus^^ — ggo, Jul 08 '21 at 15:55

score 1 · Answer 1 · answered Jul 08 '21 at 19:49

For the purpose of a delete and with the cardinality of the data you have, the index on parent_id is not useful, as you have seen.

Given you know you need to delete 99% of the rows, doing so is highly inneficient for many reasons, not least the growth of the transaction log.

Every statement you execute is atomic and its own implicit transaction, if the delete were to fail mid-way ie a power cut, SQL Server needs to be able to roll the incomplete delete back, for which it uses the transaction log, so all the rows that are deleted will hit the transaction log.

In cases such as these it's much more performant to insert the rows to keep into a new table, drop the original table and then rename the new table to the original; you can also script the indexes/constraints from the original and apply them to the new table.

Where deleting a large proportion of rows but less than 50% of the table, other recommendations would be to split the job into batches and delete <5000 rows at a time (5000 is the rough threshold for lock-escalation).

Often it can help to create a View on the table selecting the top n rows to delete, ordered by a specific key, then delete from the View.

score 0 · Answer 2 · answered Jul 08 '21 at 16:50

0

May be I found the answer here: https://stackoverflow.com/a/3650886.

Looks like this is an expected behavior and that the problem is really updating the index (IX_Child_Parent_ID in my case).

answered Jul 08 '21 at 16:50

ggo

471
7
17

SQL Server: creating indexes on foreign keys

2 Answers2