I have read multiple sources and still don't understand where a big difference comes from in a query I have for Microsoft SQL Server.
I need to count different alerts linked to vehicles (IdMateriel
is synonymous to the id of the vehicle) based on types (CodeAlerte
), state (Etat
), and a Top
true/false column, but beofre the different counts I need to select the data.
TLDR : There are two parameters, the current date as an SQL DATETIME, and the VARCHAR(MAX) string of entity codes separated by commas, which I split using STRING_SPLIT to use them either in WHERE IN
clause or INNER JOIN
. Using it in the first clause is ~10x faster than the second clause although it seems equivalent to me. Why?
First, the queries are based on the view created as follows:
CREATE OR ALTER VIEW [dbo].[AlertesVehicules]
WITH SCHEMABINDING
AS
SELECT dbo.Alerte.IdMateriel, dbo.Materiel.EntiteGestion, dbo.Alerte.IdTypeAlerte,
dbo.TypeAlerte.CodeAlerte, dbo.TypeAlerte.TopAlerteMajeure, dbo.Alerte.Etat,
Vehicule.Top, Vehicule.EtatVehicule,COUNT_BIG(*) AS COUNT
FROM dbo.Alerte
INNER JOIN dbo.Materiel on dbo.Alerte.IdMateriel= dbo.Materiel.Id
INNER JOIN dbo.Vehicule on dbo.Vehicule.Id= dbo.Materiel.Id
INNER JOIN dbo.TypeAlerte on dbo.Alerte.IdTypeAlerte = dbo.TypeAlerte.Id
WHERE dbo.Materiel.EntiteGestion is NOT NULL
AND dbo.TypeAlerte.CodeAlerte IN ('P07','P08','P09','P11','P12','P13','P14')
GROUP BY dbo.Alerte.IdMateriel, dbo.Materiel.EntiteGestion, dbo.Alerte.IdTypeAlerte,
dbo.TypeAlerte.CodeAlerte, dbo.TypeAlerte.TopAlerteMajeure, dbo.Alerte.Etat,
Vehicule.Top, Vehicule.EtatVehicule
GO
CREATE UNIQUE CLUSTERED INDEX IX_AlerteVehicule
ON dbo.AlertesVehicules (EntiteGestion,IdMateriel,CodeAlerte,Etat,TopAlerteMajeure);
GO
This first version of the query takes ~100ms:
SELECT DISTINCT a.IdMateriel, a.CodeAlerte, a.Etat, a.Top INTO #tmpTabAlertes
FROM dbo.AlertesVehicules a
LEFT JOIN tb_AFFECTATION_SECTION ase ON a.IdMateriel = ase.ID_Vehicule
INNER JOIN (SELECT value AS entiteGestion FROM STRING_SPLIT(@entiteGestion, ',')) eg
ON a.EntiteGestion = eg.entiteGestion
WHERE
a.CodeAlerte IN ('P08','P09')
AND @currentDate <= ISNULL(ase.DateFin, @currentDate)
According to SQL Sentry Plan Explorer, the actual execution plan starts with an index seek taking ~30% of the time on dbo.Alerte with the predicate Alerte.IdTypeAlerte=TypeAlerte.Id
, outputting 369 000 rows of Etat
, IdMateriel
and IdTypeAlerte
, which it then directly filters down to 7 742 based on predicate PROBE(Opt_Bitmapxxxx,Alerte.IdMateriel)
, and then inner joins taking ~25% of the time with the 2 results of another index seek of TypeAlerte
but with predicates TypeAlerte.CodeAlerte = N'P08' and =N'P09'
. So just these two parts take > 50ms, but I don't understand why there are so many initial results.
The second version takes ~10ms :
SELECT DISTINCT a.dMateriel, a.CodeAlerte, a.Etat, a.Top INTO #tmpTab
FROM dbo.AlertesVehicules a
LEFT JOIN tb_AFFECTATION_SECTION ase ON a.IdMateriel = ase.ID_Vehicule
WHERE
a.EntiteGestion IN (SELECT value FROM STRING_SPLIT(@entiteGestion, ','))
AND a.CodeAlerte IN ('P08','P09')
AND (@currentDate <= ISNULL(ase.DateFin, @currentDate))
For this one, SQL Sentry Plan Explorer starts with a View Clustered Index Seek directly on the view AlertesVehicules
with Seek Predicates AlertesVehicule.EntiteGestion > Exprxxx1 and <Exprxxx2
and Predicate AlertesVehicules.CodeAlerte=N'P08' and =N'P09'
Why are those two treated so differently when it seems to me that they are exactly equivalent?
For references, here are some threads I already looked into, but didn't seem to find an explanation in (except that there shouldn't be a difference):
Performance gap between WHERE IN (1,2,3,4) vs IN (select * from STRING_SPLIT('1,2,3,4',','))