1

Hello I have the below SQL query that is taking on average 40 minutes to run, one of the tables that it references has over 7 million records in it.

I have ran this through the database tuning advisor and applied all recommendations, also I have assesed it within the activity monitor in sql and no further indexes etc have been recommended.

Any suggestions would be great, thanks in advance

WITH CTE AS 
(
    SELECT r.Id AS ResultId,
    r.JobId,
    r.CandidateId,
    r.Email,
    CAST(0 AS BIT) AS EmailSent,
    NULL AS EmailSentDate,
    'PICKUP' AS EmailStatus,
    GETDATE() AS CreateDate,
    C.Id AS UserId,
    C.Email AS UserEmail,
    NULL AS Subject
    FROM Result R
    INNER JOIN Job J ON R.JobId = J.Id
    INNER JOIN User C ON J.UserId = C.Id
    WHERE 
    ISNULL(J.Approved, CAST(0 AS BIT)) = CAST(1 AS BIT)
    AND ISNULL(J.Closed, CAST(0 AS BIT)) = CAST(0 AS BIT)
    AND ISNULL(R.Email,'') <> '' -- has an email address
    AND ISNULL(R.EmailSent, CAST(0 AS BIT)) = CAST(0 AS BIT) -- email has not been sent
    AND R.EmailSentDate IS NULL -- email has not been sent
    AND ISNULL(R.EmailStatus,'') = '' -- email has not been sent
    AND ISNULL(R.IsEmailSubscribe, 'True') <> 'False' -- not unsubscribed
    -- not already been emailed for this job
    AND NOT EXISTS (
        SELECT SMTP.Email
        FROM SMTP_Production SMTP
        WHERE SMTP.JobId = R.JobId AND SMTP.CandidateId = R.CandidateId
    )
    -- not unsubscribed
    AND NOT EXISTS (

        SELECT u.Id FROM Unsubscribe u
        WHERE  ISNULL(u.EmailAddress, '') = ISNULL(R.Email, '')

    )
    AND NOT EXISTS (
        SELECT SMTP.Id FROM SMTP_Production SMTP
        WHERE SMTP.EmailStatus = 'PICKUP' AND SMTP.CandidateId = R.CandidateId
    )   
    AND C.Id NOT IN (
        -- list of ids
    )
    AND J.Id NOT IN (
        -- list of ids
    )
    AND J.ClientId NOT IN 
    (
        -- list of ids
    )
)
INSERT INTO smtp_production (ResultId, JobId, CandidateId, Email, EmailSent, EmailSentDate, EmailStatus, CreateDate, ConsultantId, ConsultantEmail, Subject)
OUTPUT INSERTED.ResultId,GETDATE() INTO ResultstoUpdate
SELECT 
    CTE.ResultId,
    CTE.JobId,
    CTE.CandidateId,
    CTE.Email,
    CTE.EmailSent,
    CTE.EmailSentDate,
    CTE.EmailStatus,
    CTE.CreateDate,
    CTE.UserId,
    CTE.UserEmail,
    NULL
FROM CTE
  INNER JOIN 
    (
        SELECT *, row_number() over(partition by CTE.Email, CTE.CandidateId order by CTE.EmailSentDate desc) as rn
        FROM CTE

    ) DCTE ON CTE.ResultId = DCTE.ResultId AND DCTE.rn = 1

Please see my updated query below:

WITH CTE AS 
(
    SELECT R.Id AS ResultId,
    r.JobId,
    r.CandidateId,
    R.Email,
    CAST(0 AS BIT) AS EmailSent,
    NULL AS EmailSentDate,
    'PICKUP' AS EmailStatus,
    GETDATE() AS CreateDate,
    C.Id AS UserId,
    C.Email AS UserEmail,
    NULL AS Subject
    FROM RESULTS R
    INNER JOIN JOB J ON R.JobId = J.Id
    INNER JOIN Consultant C ON J.UserId = C.Id
    WHERE 
    J.DCApproved = 1
    AND (J.Closed = 0 OR J.Closed IS NULL)
    AND (R.Email <> '' OR R.Email IS NOT NULL)
    AND (R.EmailSent = 0 OR R.EmailSent IS NULL)
    AND R.EmailSentDate IS NULL -- email has not been sent
    AND (R.EmailStatus = '' OR R.EmailStatus IS NULL)
    AND (R.IsEmailSubscribe = 'True' OR R.IsEmailSubscribe IS NULL)
    -- not already been emailed for this job
    AND NOT EXISTS (
        SELECT SMTP.Email
        FROM SMTP_Production SMTP
        WHERE SMTP.JobId = R.JobId AND SMTP.CandidateId = R.CandidateId
    )
    -- not unsubscribed
    AND NOT EXISTS (

        SELECT u.Id FROM Unsubscribe u
        WHERE (u.EmailAddress = R.Email OR (u.EmailAddress IS NULL AND R.Email IS NULL))

    )
    AND NOT EXISTS (
        SELECT SMTP.Id FROM SMTP_Production SMTP
        WHERE SMTP.EmailStatus = 'PICKUP' AND SMTP.CandidateId = R.CandidateId
    )   
    AND C.Id NOT IN (
        -- LIST OF IDS
    )
    AND J.Id NOT IN (
        -- LIST OF IDS
    )
    AND J.ClientId NOT IN 
    (
        -- LIST OF IDS
    )
)

INSERT INTO smtp_production (ResultId, JobId, CandidateId, Email, EmailSent, EmailSentDate, EmailStatus, CreateDate, UserId, UserEmail, Subject)
OUTPUT INSERTED.ResultId,GETDATE() INTO ResultstoUpdate
SELECT 
    CTE.ResultId,
    CTE.JobId,
    CTE.CandidateId,
    CTE.Email,
    CTE.EmailSent,
    CTE.EmailSentDate,
    CTE.EmailStatus,
    CTE.CreateDate,
    CTE.UserId,
    CTE.UserEmail,
    NULL
FROM CTE
  INNER JOIN 
    (
        SELECT *, row_number() over(partition by CTE.Email, CTE.CandidateId order by CTE.EmailSentDate desc) as rn
        FROM CTE

    ) DCTE ON CTE.ResultId = DCTE.ResultId AND DCTE.rn = 1


GO
Matthew Stott
  • 395
  • 1
  • 4
  • 15
  • 3
    `ISNULL` in your `WHERE` is going to ruin any SARGability your query had. Don't use it in your `WHERE` use the format `({Expression} = {Value} or {Expression} IS NULL)` – Thom A Oct 17 '18 at 08:59
  • 1
    @Larnu I believe this comment is productive enough to be an answer by itself. – George Menoutis Oct 17 '18 at 09:01
  • @GeorgeMenoutis you're probably right. I wanted to review the rest of the query first, but seeing that was a very early spot. – Thom A Oct 17 '18 at 09:02
  • The clause `R.Email <> '' OR R.Email IS NOT NULL` is a little redundate. If `R.Email` has a value of `''` then it doesn't have a value of `NULL`; so will evaluate to true (meaning values with the value `''` will be returned). You probably only need `R.Email <> ''` here, as `NULL <> ''` evaluates to Unknown, which isn't true. – Thom A Oct 17 '18 at 12:03
  • Even now just running the SELECT STATEMENT e.g. SELECT CTE.ResultId, etc FROM CTE is causing my server to run at 100% ram – Matthew Stott Oct 17 '18 at 13:22
  • We're probably going to need to the execution plan ([Paste The Plan](https://www.brentozar.com/pastetheplan/)) and DDL (table and relevant indexes) at this stage now. I'd suggest posting it as a new question, as this one has been marked as answered and less likely to gain traction. This question solves the SARGability problem, and gave a huge boost. Now we need to address things like Indexes, which is a completely different kettle of fish. – Thom A Oct 17 '18 at 13:23
  • Sorry what do you mean by DDL – Matthew Stott Oct 17 '18 at 13:34
  • I have posted: https://stackoverflow.com/questions/52856537/sql-long-running-query-maxing-out-server-resource-e-g-ram-cpu – Matthew Stott Oct 17 '18 at 13:50

2 Answers2

4

Using ISNULL in your WHERE and JOIN clauses is probably the main cause here. Using functions against columns in your query causes the query to become non-SARGable (meaning that it can't use any of the indexes on your table(s) and so it has the scan the whole thing). Note; using functions against variables, in there WHERE is normally fine. For example WHERE SomeColumn = DATEADD(DAY, @n, @SomeDate). Things like WHERE SomeColumn = ISNULL(@Variable,0) have the smell of a "catch-all query", so can be performance hitters; depending on your set up. This isn't the discussion at hand though.

For clauses like ISNULL(J.Closed, CAST(0 AS BIT)) = CAST(0 AS BIT) this is therefore a big headache for the query optimiser and your query is riddled with them. You'll need to replace these with clauses like:

WHERE (J.Closed = 0 OR J.Closed IS NULL)

Although it makes no difference, there's no need to CAST the 0 there either. SQL Server can see you're making a comparison to a bit and will therefore interpret the 0 as one as well.

You also have a EXISTS with the WHERE clause ISNULL(u.EmailAddress, '') = ISNULL(R.Email, ''). This will need to become:

WHERE (u.EmailAddress = R.Email
  OR   (u.EmailAddress IS NULL AND R.Email IS NULL))

You'll need to change all of your ISNULL usage in your WHERE clauses (the CTE and the subqueries) and you should see a decent performance increase.

Thom A
  • 88,727
  • 11
  • 45
  • 75
  • WOW amazing, such a comprehensive answer, really appreciate that, going to implement this now and will post the results – Matthew Stott Oct 17 '18 at 09:29
  • Check your query plan - 1:48 is about 1:30 more than I would expect WORST CASE. – TomTom Oct 17 '18 at 10:05
  • @MatthewStott glad to have helped. Sounds like there *might* be room for improvement still (1:48 is still a long running query in a lot of places) but a significant improvement already. Add the latest version of your query in the question; let's see if we can't improve it further. – Thom A Oct 17 '18 at 10:05
  • Larnu - thank you, yes I absolutely will do as still when I run the query my server spikes at 100% and the server has good resource – Matthew Stott Oct 17 '18 at 11:22
  • Hello, I have updated my post with the new query, unfortnately I cannot show the execution plan as when I run it my server spikes at 100% CPU – Matthew Stott Oct 17 '18 at 11:58
1

Generally, 7 million records are a joke for modern databases. If you alk problems, you are supposed to talk problems on billions of rows, not 7 millions.

Which indicates problems with the query. High CPU is generally a sign of non matching fields (compare string in one table to number in another ) or... functions called too often. Long running normally is a sign of either missing indices or.... non sargeability. Which you really do a lot to force.

Non-Sargeability means taht indices CAN NOT be used. Example of this is all this:

ISNULL(J.Approved, CAST(0 AS BIT)) = CAST(1 AS BIT)

The ISNULL(field, value) means that an index on field is not usable - baically "goodby index, hello table scan". It also means - well....

(J.Approoved = 1 or J.Approoved IS NULL)

has the same meaning, but it sargeable. Pretty much EVERY of your conditions is written in a non sargeable way - welcome to db hell. Start rewriting.

You may want to read up more on sargeability at https://www.techopedia.com/definition/28838/sargeable

Also make sure you ahve indices on all relevant foreign keys (and the referenced primary keys) - otherwise, again, welcome table scans.

TomTom
  • 61,059
  • 10
  • 88
  • 148