Short Answer: Maybe.
Long Answer:
If there are very few different values in deleted_at
, MySQL will not use INDEX(deleted_at)
.
If there are a not of different non-null dates in deleted_at
, MySQL will use INDEX(deleted_at)
.
Most of the discussion (so far) has failed to bring the cardinality of this single-column index into consideration.
Note: This is not the same as a 2-value flag such as is_deleted
. It is useless to have a single-column index on such.
More discussion (from MySQL point of view)
https://laravel.com/docs/5.2/eloquent#soft-deleting says
Now, when you call the delete method on the model, the deleted_at column will be set to the current date and time. And, when querying a model that uses soft deletes, the soft deleted models will automatically be excluded from all query results.
From that, I assume this is occurring in the table definition:
deleted_at DATETIME NULL -- (or TIMESTAMP NULL)
And the value is initialized (explicitly or implicitly) to NULL
.
Case 1: Lots of new rows, none yet 'deleted': All the deleted_at
values are NULL
. In this case, the Optimizer will shun INDEX(deleted_at)
as not helping. In fact using the index would hurt because it would cost more to go through the entire index and the data. It would be cheaper to ignore the index and simply assume all rows are candidates for being SELECTed
.
Case 2: A few rows (out of many) have been deleted: Now deleted_at
has multiple values. Although Laravel only cares about IS NULL
vs IS NOT NULL
, MySQL sees it as a multi-valued column. But, since the test is for IS NULL
and most rows are still NULL
, the Optimizer's reaction is the same as for Case 1.
Case 3: A lot more rows are soft-deleted than still active: Now the index has suddenly become useful because only a small percentage of the table IS NULL
.
There is no exact cutoff between Case 2 and Case 3. 20% is a handy Rule of Thumb.
Now, from the execution point of view.
INDEX(deleted_at)
used for deleted_at IS NULL
:
- Drill down the Index BTree for the first row with
NULL
.
- Scan until
IS NULL
fails.
- For each matching row, reach over into the data BTree to get the row.
INDEX(deleted_at)
is not used:
- Scan the data BTree (or use some other index)
- For each data row, check that
deleted_at IS NULL
, else filter out that row.
Composite index:
It may be very beneficial to have a "composite" (multi-column) index starting with deleted_at
. Example:
INDEX(deleted_at, foo)
WHERE deleted_at IS NULL
AND foo BETWEEN 111 AND 222
This is very likely to use the index effectively regardless of what percentage of the table has deleted_at IS NULL
.
- Drill down the Index BTree for the first row with
NULL
and foo >= 111
.
- Scan until
IS NULL
or foo <= 222
fails.
- For each matching row, reach over into the data BTree to get the row.
Note that in an INDEX
, NULL
acts very much like any other single value. (And NULLs
are stored before other values.)