2

I have inherited an application made by a previous developer. Some of the database calls are running slow in places where there is a large amount of data. I have found in general the SQL code is well written but there are places that make me think, 'what the..?'

Here is one example:

select a.*
from bs_ResearchEnquiry a 
left join bs_StateWorkflowState_Map b
on (
   select c.MapId from bs_StateWorkflowState_Map c 
   where c.StateId = a.StateId AND c.StateWorkflowId = a.StateWorkflowId
   )=b.MapId     
where
    b.IsFinal=1
  • The MapId field is a unique primary key to the bs_StateWorkflowState_Map table.
  • StateId and StateWorkflowId together also form a unique key.
  • There will always be a match on these keys to rows in the foreign table bs_ResearchEnquiry

Therefore, could I rewrite the left join more efficiently, and safely, as:

inner join bs_StateWorkflowState_Map b
on b.StateId = a.StateId AND b.StateWorkflowId = a.StateWorkflowId

Or was the original developer trying to achieve something I've missed ?

Andomar
  • 232,371
  • 49
  • 380
  • 404
userSteve
  • 1,554
  • 1
  • 22
  • 34
  • UPDATE: I have just tried the simpler join to find the opposite effect! Execution time has increased from a few seconds to well over a minute. So although the original syntax appears over-engineered it could be the developer was using the most efficient method after all. Not yet sure why this is the case. – userSteve Sep 05 '17 at 10:05
  • have a look at the [query plan](https://stackoverflow.com/a/7359705/50552). Does performance improve if you keep the `left join`? – Andomar Sep 05 '17 at 10:07
  • @Andomar yes, the left join does improve things. My syntax with Inner Join takes 90 seconds My syntax with Left Join takes 45 seconds The original syntax takes <5 seconds My aim was to get it <2 seconds – userSteve Sep 05 '17 at 10:08
  • Maybe I should change this question to, Why is the original complicated join faster than the simplified version ? – userSteve Sep 05 '17 at 10:23
  • 1
    It can only mean one of two things: 1) the optimizer is doing a bad job here, or 2) one of your three assumptions is wrong or not established by constraints (and hence not visible to the DBMS). – Thorsten Kettner Sep 05 '17 at 10:27
  • FINAL UPDATE: The StateId and StateWorkflowId fields were indexed, but separately. I replaced this with a compound index and performance has greatly increased to desirable levels. Thanks for all the input – userSteve Sep 05 '17 at 12:32

2 Answers2

4

Your simplification looks good to me. Note that the presence of:

where b.IsFinal = 1

Means that the outer join is effectively inner join.

Andomar
  • 232,371
  • 49
  • 380
  • 404
0

With your explanation on keys given, you are right, the query can be simplified. It selects records from bs_ResearchEnquiry where the associated bs_StateWorkflowState_Map record is final. So use EXISTS:

select *
from bs_ResearchEnquiry re
where exists
(
  select *
  from bs_StateWorkflowState_Map m
  where m.StateId         = re.StateId
    and m.StateWorkflowId = re.StateWorkflowId
    and m.IsFinal = 1
);

(From your explanation on uniqueness, I gather that there already exist indexes on (StateId, StateWorkflowId) in both tables. If not, create them.)

Thorsten Kettner
  • 89,309
  • 7
  • 49
  • 73