Is this join overcomplicated?

Question

I have inherited an application made by a previous developer. Some of the database calls are running slow in places where there is a large amount of data. I have found in general the SQL code is well written but there are places that make me think, 'what the..?'

Here is one example:

select a.*
from bs_ResearchEnquiry a 
left join bs_StateWorkflowState_Map b
on (
   select c.MapId from bs_StateWorkflowState_Map c 
   where c.StateId = a.StateId AND c.StateWorkflowId = a.StateWorkflowId
   )=b.MapId     
where
    b.IsFinal=1

The MapId field is a unique primary key to the bs_StateWorkflowState_Map table.
StateId and StateWorkflowId together also form a unique key.
There will always be a match on these keys to rows in the foreign table bs_ResearchEnquiry

Therefore, could I rewrite the left join more efficiently, and safely, as:

inner join bs_StateWorkflowState_Map b
on b.StateId = a.StateId AND b.StateWorkflowId = a.StateWorkflowId

Or was the original developer trying to achieve something I've missed ?

UPDATE: I have just tried the simpler join to find the opposite effect! Execution time has increased from a few seconds to well over a minute. So although the original syntax appears over-engineered it could be the developer was using the most efficient method after all. Not yet sure why this is the case. — userSteve, Sep 05 '17 at 10:05
have a look at the [query plan](https://stackoverflow.com/a/7359705/50552). Does performance improve if you keep the `left join`? — Andomar, Sep 05 '17 at 10:07
@Andomar yes, the left join does improve things. My syntax with Inner Join takes 90 seconds My syntax with Left Join takes 45 seconds The original syntax takes <5 seconds My aim was to get it <2 seconds — userSteve, Sep 05 '17 at 10:08
Maybe I should change this question to, Why is the original complicated join faster than the simplified version ? — userSteve, Sep 05 '17 at 10:23
It can only mean one of two things: 1) the optimizer is doing a bad job here, or 2) one of your three assumptions is wrong or not established by constraints (and hence not visible to the DBMS). — Thorsten Kettner, Sep 05 '17 at 10:27
FINAL UPDATE: The StateId and StateWorkflowId fields were indexed, but separately. I replaced this with a compound index and performance has greatly increased to desirable levels. Thanks for all the input — userSteve, Sep 05 '17 at 12:32

score 4 · Accepted Answer · answered Sep 05 '17 at 10:01

4

Your simplification looks good to me. Note that the presence of:

where b.IsFinal = 1

Means that the outer join is effectively inner join.

answered Sep 05 '17 at 10:01

Andomar

232,371
49
380
404

Thorsten Kettner · Answer 2 · 2017-09-05T10:24:21.397

0

With your explanation on keys given, you are right, the query can be simplified. It selects records from bs_ResearchEnquiry where the associated bs_StateWorkflowState_Map record is final. So use EXISTS:

select *
from bs_ResearchEnquiry re
where exists
(
  select *
  from bs_StateWorkflowState_Map m
  where m.StateId         = re.StateId
    and m.StateWorkflowId = re.StateWorkflowId
    and m.IsFinal = 1
);

(From your explanation on uniqueness, I gather that there already exist indexes on (StateId, StateWorkflowId) in both tables. If not, create them.)

edited Sep 05 '17 at 10:24

answered Sep 05 '17 at 10:16

Thorsten Kettner

89,309
7
49
73

Well, in SQL Server multiple fields in `IN` clause are invalid. You can use `EXISTS` instead. – Rokuto Sep 05 '17 at 10:19
where (x, y) in ... is not valid SQL – userSteve Sep 05 '17 at 10:21
@Rokuto: Well, spotted. I always forget about this limitation in SQL Server, which really makes some queries less readable. I'll update. – Thorsten Kettner Sep 05 '17 at 10:21
@userSteve: Not valid in SQL Server, but many DBMS support it and it makes for very readable queries. A pity that SQL Server lacks it. I'll update. – Thorsten Kettner Sep 05 '17 at 10:22

Is this join overcomplicated?

2 Answers2