0

I have a query that joins a table using this condition:

(Apps.mfrId = Manufacturer.id OR Apps.mfrId IS NULL OR Apps.mfrId = 0)

The query takes 17 seconds to run, and using a profiler the query causes between 5 - 30 messages (it varies each run) with error "Error: 1222, Severity: 16, State: 18".

I leave the query exactly the same, but I change the above condition to read:

(Apps.mfrId = Manufacturer.id OR ISNULL(apps.mfrId, 0) = 0)

...and now the same query with this one change runs in 140ms and has no lock errors.

Why might this happen?

Note, before testing on tables Manufacturer and Apps I have run DBCC CHECKTABLE with the repair_bebuild option, and have rebuilt the indices on both tables also.

Also note, there are no other queries running against the database at the same time.

Here is a simplified version of the query which has errors:

select top 2000 Apps.object_id
from 
    Manufacturer
    INNER JOIN Apps ON (
        Apps.mfrId = Manufacturer.Id
        OR Apps.mfrId IS NULL
        OR Apps.mfrId = 0 
    ) 
where
    Apps.OBJECT_ID = 6879149

If instead of "top 2000" I use "top 1000" the query completes in just over 100ms.

Developer Webs
  • 983
  • 9
  • 29
  • Did you show the execution plan for each query? – SS_DBA Oct 06 '17 at 13:56
  • that join condition looks awfully odd. Is this a left join, inner join, ???? I'm interested in why you'd want a join condition which would be overridden if the apps.mfrid was null. sounds like you simply want a full join. What's the end goal? – S3S Oct 06 '17 at 14:01
  • @scsimon It's an inner join. I'm working with tables I did not design. Apps.mfrId is a nullable foreign key to Manufacturer.id (with no foreign key constraints setup). For some reason the original design allows for NULL or 0, and either NULL or 0 have the same meaning (that no valid value is set). – Developer Webs Oct 06 '17 at 14:50
  • @WEI_DBA I have not yet. I'm not a DBA, I'm just a lowly software developer. I'll have a look at the plan, but it's possible that it may not help me anyway. – Developer Webs Oct 06 '17 at 14:51
  • My guess is that you want to left join on Apps.mfrId = Manufacturer.id. Also, your two examples are not the same. Apps.Field62 is not Apps.mfrId, i would expect. Is it your intention to join a row with id 0 when no value is supplied in Apps? In which case you might want "ISNULL(apps.mfrId, 0) = Manufacturer.id". – Greg Oct 06 '17 at 15:55
  • Apps.Field62 should have read Apps.mfrId, mistyped. Corrected now. – Developer Webs Oct 06 '17 at 15:57
  • @Greg The problem with the Left Join though is it doesn't do the correct thing. I want either the App to join to one manufacturer, or all of them. The App must never be NULL (which is what the left join would give me). – Developer Webs Oct 06 '17 at 16:26

1 Answers1

0

If Apps.mfrId is null, you are joining in every manufacturer.

From your testing, "OR ISNULL(apps.mfrId, 0) = 0" does this cheaper than "OR Apps.mfrId IS NULL OR Apps.mfrId = 0"

These do seem pretty equivalent, and you already do have a fix, so the question is just why and/or how to make the non-performer work.

When it comes to such performance, if everything you stated is correct, as we start to ask why, we start to point to the Query Optimizer. If it performs differently for the same parameters, it will have a different query plan. You'll probably see a table scan instead of index usage, or some other explanation to the bad performance.

I would encourage you to compare the query plans, or get someone else to, to help bring in an answer as to what it is doing differently. But, you already can tell it's doing something differently.

One possibility is that when Query Optimizer fixes are released, they by default are not turned on unless flag 4199 (for all fixes, or other flags for specific fixes) are turned on. This is because a general fix might be good for most, but also might break an application that was optimized in the environment of existing the prior quirks.

https://dba.stackexchange.com/questions/102292/trace-flag-4199-enable-globally

Does turning 4199 on help?

select top 2000 Apps.object_id
from 
    Manufacturer
    INNER JOIN Apps ON (
        Apps.mfrId = Manufacturer.Id
        OR Apps.mfrId IS NULL
        OR Apps.mfrId = 0 
    ) 
where
    Apps.OBJECT_ID = 6879149
OPTION(QUERYTRACEON 4199)

Another related topic is parameter sniffing. Sometimes a query plan can get cached that is optimal for one parameter but horrible for another parameter. In your case one app might return 1 manufacturer, but another app might return all manufacturers, which is why this is worth mentioning. This usually shows up as the same code inconsistently performing poorly when given different parameters. You can try turning off parameter sniffing or forcing recompiling to help diagnose if that appears to be part of the issue.

What are the main differences between OPTION(OPTIMIZE FOR UNKNOWN) and OPTION(RECOMPILE)?

I have seen situations where if felt like the Query Optimizer had given up on perusing indexes, just because the statement became too complex. But this doesn't seem likely in your example; other than the topic of query optimizer bugs get fixed, but to utilize them you still have to turn on query flags, or else that patch you install may very well do nothing.

Sometimes you can also try to help guide the sql optimizer. If there is an index that should always be used, that can be given as a query hint.

I'd also be curious to know if the following removes the issue:

select top 2000 Apps.object_id
from
(
    select Apps.object_id, Apps.mfrId 
    from Apps
    where Apps.OBJECT_ID = 6879149
) Apps
left join Manufacturer  ON (
        Apps.mfrId = Manufacturer.Id
        OR Apps.mfrId IS NULL
        OR Apps.mfrId = 0 )

If all query optimizer fixes are turned on, and the problem still persists, we have to ask why the query optimizer did this. It can only use indexes that exist, and it can only figure out if using an index will be beneficial if statistics exist. Meanwhile, outdated statistics will lead to bad choices. It can be good to rebuild/reorganize indexes and update statistics periodically. You may want to try doing so and see if it has any affect.

Conclusion

Since you already have a working fix, do use it. But your question is why do two very similar things achieve very different results. Assuming the same parameter is being used, and including the fact that your two options are so similar, the issue points to the query optimizer making a bad choice. This suggests you may need to both install the latest patches (if not installed) and also enable 4199 to actually enable all the query optimizer fixes that you have already installed via sql server patches. This includes adding OPTION(QUERYTRACEON 4199) to the bottom of your sql, or enabling 4199 globally, or similar.

Greg
  • 2,410
  • 21
  • 26
  • The idea is this. We have an Application (an application being where a part is used on a vehicle). If the Application has the manufacturer set, then that vehicle is associated with that one manufacturer. If the vehicle does not have a manufacturer set (its mfrid is 0 or null), then it's associated with all manufacturers. – Developer Webs Oct 06 '17 at 16:02
  • Yes, but why is the ISNULL variation hundreds of times faster, and why does it not get tons of error 1222 when the other version of the query does get 5 - 30 1222 errors. – Developer Webs Oct 06 '17 at 16:12
  • Ya, I've run it through the optimizer, and used the compare option in SQL management studio to compare execution plans. I get the concept of what it's showing me, but it's not terribly useful for me in this case. – Developer Webs Oct 06 '17 at 16:22
  • The query you gave reduces the query execution time from roughly 4.5 minutes to less than a second. As for the OPTION(QUERYTRACEON 4199) statement, I'll have to talk to our IT guy as the query is run by a user without sufficient permissions (User 'test' does not have permission to run DBCC TRACEON.). The indices were rebuilt earlier today on the tables. – Developer Webs Oct 06 '17 at 17:44
  • Appending "OPTION(QUERYTRACEON 4199)" to the end of my query does not seem to improve the performance of the query. – Developer Webs Oct 06 '17 at 18:07
  • The fix I found I can't really use. The query is dynamically generated and executed. I'd have to dig through ancient VB6 code to correct the generation of the statement (and pray it doesn't break anything else). – Developer Webs Oct 06 '17 at 18:10