How can it be optimize a long running query?

Question

I have a query that consumes a lot of CPU causing some timeouts:

select cast(count(*) as INT) as colCount 
from TableGR tgr 
where ((tgr.OriginCode is not null) 
        and (tgr.OriginCode in (@p0 , @p1 , @p2 , @p3)) 
        or tgr.Type=@p4 
        and (tgr.OriginVat in (@p5 , @p6 , @p7 , @p8)) 
        or (tgr.DestinCode is not null) 
        and (tgr.DestinCode in (@p9 , @p10 , @p11 , @p12)) 
        or (exists (select t1.Id 
                  from Transporters t1 
                  where tgr.GarId=t1.GarId)) 
        and (exists (select t2.Id 
                   from Transporters t2 
                   where tgr.GarId=t2.GarId and (t2.Vat in (@p13 , @p14 , @p15 , @p16))))
     ) 
     and (tgr.DeletedUtc is null);

I imagine that the query executes multiple subqueries are the cause for it, and I consider to add a new column to table TableGR to contain a comma separated string with t1.Id and t2.Id. Another option I consider is to add two new column to table TableGR (column t1Id, t2Id).

Questions seeking performance help should include DDL,DML Of the tables involved along with test data..if your test data is large,try scripting out schema and stats for the table(right click database->generate scripts->select specific database objects->in next screen select advanced and choose Script statistics) and paste it in question..With this info any one repro the same issue you are facing.Otherwise it becomes very difficult to answer your question .Pasting server version also helps — TheGameiswar, Nov 08 '18 at 17:03
Look at the execution plan using SQL Server Management Studio. You'l see lots of SCANs rather than SEEKs. You can also use https://www.brentozar.com/pastetheplan/ to show ***us*** the execution plan. — MatBailie, Nov 08 '18 at 17:03
I would rather use left join instead of exists clause to check if ID exists and finally filter records with a where clause. Also, use "WITH (NOLOCK)" in case there are more than one session that can trigger the query. — WhoamI, Nov 08 '18 at 17:07
@Ayush using `WITH NOLOCK` is a horrible idea. It means "read dirty data", not "don't take locks". It's *DEFINITELY NOT* what you should do to improve performance — Panagiotis Kanavos, Nov 08 '18 at 17:08
You may not need a CAST on COUNT(*) as well !! Count(*) always returns an integer. — WhoamI, Nov 08 '18 at 17:08
I agree but definitely an overhead which is not impacting the result. NOLOCKs for sure is not a good thing unless you have a big transactional system. So it is left open to the question raiser. — WhoamI, Nov 08 '18 at 17:11
Format your query and check the conditions. Are you *sure* you want that sequence of `AND` and `OR` operators? Are the nestings those you want? Some of those conditions aren't needed, for example `or (tgr.DestinCode is not null) and (tgr.DestinCode in (@p9 , @p10 , @p11 , @p12)) ` is equivalent to `or (tgr.DestinCode in (@p9 , @p10 , @p11 , @p12))`. The `IN` clause will return `FALSE if the field contains a NULL. — Panagiotis Kanavos, Nov 08 '18 at 17:11
@Ayush that unless isn't true. That's **exactly when you shouldn't use it**. It takes **more restrictive** locks while allowing **dirty, duplicate and ghost reads**. In a busy system it **causes more blocks** and has a **higher chance of reading garbage or outright failing** — Panagiotis Kanavos, Nov 08 '18 at 17:13
@PanagiotisKanavos whats the best way to avoid blocks then? I have seen cases where blocks occurs and all session get hanged. Later, DBA bounces it to free. — WhoamI, Nov 08 '18 at 17:16
@ze.espogeira The fields involved in the query should be covered by indexes, and so should the fields used in foreign keys, eg `GarId`. Remember, `AND` has a higher precedence than `OR`. This means that the two subqueris at the end overlap - the second one wins because it's more restrictive but both will cause seeks, scans etc — Panagiotis Kanavos, Nov 08 '18 at 17:17
@Ayush write *good* queries, Use proper indexes. Don't look for hacks. Don't use long transactions. Use snapshot isolation if possible but don't expect it to cover up bad coding — Panagiotis Kanavos, Nov 08 '18 at 17:18

score 0 · Accepted Answer · answered Nov 08 '18 at 17:13

You can change your query to use a join instead of if exists to optimize your query.

I consider to add a new column to table TableGR to contain a comma separated string with t1.Id and t2.Id`

bad idea.

Another option I consider is to add two new column to table TableGR (column t1Id, t2Id)

this might also work.

There are so many possible optimizations, which may or may not help you depending on your data. That is why it's best to look at what exactly is taking the most time in your query execution using the view query execution plan

score 0 · Answer 2 · answered Nov 08 '18 at 17:20

If I reformat your query I can see that it is made from searches on many different columns. That makes it very hard for an optimiser to make use of any indexes.

select
  cast(count(*) as INT) as colCount 
from
  TableGR tgr 
where
(
    (tgr.OriginCode is not null) 
and (tgr.OriginCode in (@p0 , @p1 , @p2 , @p3)) 

or
    tgr.Type=@p4 
and (tgr.OriginVat in (@p5 , @p6 , @p7 , @p8))

or

    (tgr.DestinCode is not null) 
and (tgr.DestinCode in (@p9 , @p10 , @p11 , @p12))

or

    (exists (select t1.Id from Transporters t1 where tgr.GarId=t1.GarId)) 
and (exists (select t2.Id from Transporters t2 where tgr.GarId=t2.GarId and (t2.Vat in (@p13 , @p14 , @p15 , @p16))))
) 
and
  (tgr.DeletedUtc is null);

One way to mitigate that is to break it into simpler queries that can make use of indexes on your table.

(I've simplified x IS NOT NULL AND x IN (a,b,c) to x IN (a,b,c), because if x is null then it's never in any list...)

SELECT
  COUNT(*)   AS colCount
FROM
(
  -- Could use an index on (DeletedUtc, OriginCode)
  SELECT PrimaryKeyColumn
    FROM TableGR tgr 
   WHERE tgr.DeletedUtc IS NULL
     AND tgr.OriginCode in (@p0 , @p1 , @p2 , @p3)

  UNION

  -- Could use an index on (DeletedUtc, Type, OriginCode)
  SELECT PrimaryKeyColumn
    FROM TableGR tgr 
   WHERE tgr.DeletedUtc IS NULL
     AND tgr.Type=@p4 
     AND tgr.OriginVat in (@p5 , @p6 , @p7 , @p8)

  UNION

  -- Could use an index on (DeletedUtc, DestinCode)
  SELECT PrimaryKeyColumn
    FROM TableGR tgr 
   WHERE tgr.DeletedUtc IS NULL
     AND tgr.DestinCode in (@p9 , @p10 , @p11 , @p12)

  UNION

  -- Could use an index on (DeletedUtc, GarID)
  SELECT PrimaryKeyColumn
    FROM TableGR tgr 
   WHERE tgr.DeletedUtc IS NULL
      -- Why the Two EXISTS() expressions here?  If the second is TRUE the first is always also TRUE, no?
     AND (exists (select t1.Id from Transporters t1 where tgr.GarId=t1.GarId)) 
     AND (exists (select t2.Id from Transporters t2 where tgr.GarId=t2.GarId and (t2.Vat in (@p13 , @p14 , @p15 , @p16))))
)
  AS targets

Note that I've used UNION rather than UNION ALL. This is incase one row can fulfil more that one of the criteria (UNION "de-duplicates" the results, preventing one row being counted multiple times.)

If you know that any one row can only be present in a single query, use UNION ALL instead.

Then go back to your execution plan and see if there are any other indexes or other optimisations that may help.

How can it be optimize a long running query?

2 Answers2