3
SELECT C.* FROM StockToCategory STC 
INNER JOIN Category C ON STC.CategoryID = C.CategoryID 
WHERE STC.StockID = @StockID

VS

SELECT * FROM Category
WHERE CategoryID IN
    (SELECT CategoryID FROM StockToCategory WHERE StockID = @StockID)

Which is considered the correct (syntactically) and most performant approach and why?

The syntax in the latter example seems more logical to me but my assumption is the JOIN will be faster.

I have looked at the query plans and havent been able to decipher anything from them.

Query Plan 1
Query Plan 2

Maxim Gershkovich
  • 45,951
  • 44
  • 147
  • 243
  • If both can execute, both are syntactically correct. This seems to me like the classic join-versus-subquery issue... – BoltClock Jan 21 '11 at 05:14
  • Well point taken but im sure the SQL experts here will say one is "more" correct then the other. – Maxim Gershkovich Jan 21 '11 at 05:15
  • 1
    Why don't you put both queries into a SSMS query window and execute them together. Make sure to "include actual execution plan" - SSMS will show which query takes up how many percent of the total execution time. If you have 50% : 50% draw - then both are pretty much equal. If one performs much worse, SSMS will show that quite nicely – marc_s Jan 21 '11 at 05:49
  • The queries are the same if you have a unique constraint or primary key on (CategoryID, StockID) in StockToCategory. Otherwise the result of the queries may be different. – Mikael Eriksson Jan 21 '11 at 06:36

4 Answers4

11

The two syntaxes serve different purposes. Using the Join syntax presumes you want something from both the StockToCategory and Category table. If there are multiple entries in the StockToCategory table for each category, the Category table values will be repeated.

Using the IN function presumes that you want only items from the Category whose ID meets some criteria. If a given CategoryId (assuming it is the PK of the Category table) exists multiple times in the StockToCategory table, it will only be returned once.

In your exact example, they will produce the same output however IMO, the later syntax makes your intent (only wanting categories), clearer.

Btw, yet a third syntax which is similar to using the IN function:

Select ...
From Category
Where Exists    (
                Select 1
                From StockToCategory
                Where StockToCategory.CategoryId = Category.CategoryId
                    And StockToCategory.Stock = @StockId
                )
Thomas
  • 63,911
  • 12
  • 95
  • 141
  • For this variant I would argue that "... EXISTS (SELECT * ...)" is a better syntax. – Serguei Jan 21 '11 at 05:35
  • 3
    +1 for pointing out that JOIN may duplicate Category rows, *that's* the real difference between these queries. – Serguei Jan 21 '11 at 05:46
  • @yazanpro - The primary point of this question is correctness of the results. No statement about performance of any of the solutions can be made without more information about the data, the database version in use, indexes etc. Regardless of the approach without evaluating the execution plan and IO statistics we cannot make a definitive statement about which solution will perform better. – Thomas Nov 12 '13 at 15:32
1

Syntactically (semantically too) these are both correct. In terms of performance they are effectively equivalent, in fact I would expect SQL Server to generate the exact same physical plans for these two queries.

Serguei
  • 2,910
  • 3
  • 24
  • 34
  • That was my first gut instinct but they dont! Ill post the query plan online so you can all have a look... – Maxim Gershkovich Jan 21 '11 at 05:16
  • @Maxim: Could you also post the table definitions? (including indices, foreign keys). Also which version of SQL Server is generating those plans? – Serguei Jan 21 '11 at 05:39
0

for sqlite

table device_group_folders contains 10 records

table device_groups contains ~100000 records

INNER JOIN: 31 ms

WITH RECURSIVE select_childs(uuid) AS (
SELECT uuid FROM device_group_folders WHERE uuid = '000B:653D1D5D:00000003'
UNION ALL
SELECT device_group_folders.uuid FROM device_group_folders INNER JOIN select_childs ON parent = select_childs.uuid
) SELECT device_groups.uuid FROM select_childs INNER JOIN device_groups ON device_groups.parent = select_childs.uuid;

WHERE 31 ms

WITH RECURSIVE select_childs(uuid) AS (
    SELECT uuid FROM device_group_folders WHERE uuid = '000B:653D1D5D:00000003'
UNION ALL
SELECT device_group_folders.uuid FROM device_group_folders INNER JOIN select_childs ON parent = select_childs.uuid
) SELECT device_groups.uuid FROM select_childs, device_groups WHERE device_groups.parent = select_childs.uuid;

IN <1 ms

SELECT device_groups.uuid FROM device_groups WHERE device_groups.parent IN (WITH RECURSIVE select_childs(uuid) AS (
    SELECT uuid FROM device_group_folders WHERE uuid = '000B:653D1D5D:00000003'
    UNION ALL
    SELECT device_group_folders.uuid FROM device_group_folders INNER JOIN select_childs ON parent = select_childs.uuid
) SELECT * FROM select_childs);
kaegoorn48
  • 129
  • 1
  • 2
0

T think There are just two ways to specify the same desired result.

Pankaj Agarwal
  • 11,191
  • 12
  • 43
  • 59