-1

I got an LEFT JOIN exercise at school:
"List all category names with the number of their products."
Used were two tables from the northwind DB: products (77 rows) and categories (8 rows)

I thought the product table should come first, since the main-data (number of products) will be found there and only the 8 category names will be needed from the joined table. Our teacher argued, that the categories table needs to be the main table, but i still can't understand why.

The two queries are:

    SELECT C.CategoryID, CategoryName, COUNT(ProductID) [Count]
FROM Categories C LEFT JOIN Products P
    ON C.CategoryID = P.CategoryID
GROUP BY C.CategoryID, CategoryName

and

    SELECT P.CategoryID, CategoryName, COUNT(ProductID) [Count]
FROM Products P LEFT JOIN Categories C
    ON P.CategoryID = C.CategoryID
GROUP BY CategoryName, P.CategoryID

Can anybody explain to me why, in this case, a certain order of used tables matters in terms of theoretical performance?
And if: how so? (does size matter?;))

Zos
  • 1
  • 1
  • I think the performance is dependend on the size of the starting table. Although it could be that the database engine does optimization in the background. When you start with a small dataset and join the dataset with more rows, then it should in theory be a litte faster since the index is only called once for each category. – Noixes Jun 10 '22 at 09:30
  • Any question about performance is irrelevant because they are not comparable. Although any optimiser worth it's salt will find the optimal plan, so if your queries were comparable (e.g if both used `INNER JOIN`) then the optimiser would almost certain use the same plan for both queries. You can force the order of execution using [`STRAIGHT JOIN`](https://dev.mysql.com/doc/refman/8.0/en/join.html), but I can probably count on one hand the number of times I've ever had to force the order because I know better than the optimiser – GarethD Jun 10 '22 at 09:34
  • Duplicate of [What is the difference between "INNER JOIN" and "OUTER JOIN"?](https://stackoverflow.com/q/38549/3404097) – philipxy Jun 10 '22 at 10:16
  • Is this correct syntax?? `[Count]` – Rick James Jun 19 '22 at 18:28
  • We cannot answer the question without knowing which table `CategoryName` is in. – Rick James Jun 19 '22 at 18:29
  • @philipxy - It may be a duplicate, but there are two problems with suggesting it. (1) This Question specifies `[mysql]`; that Question is more generic. MySQL ignores "inner" and "outer". (2) "LEFT" is that this question is about. – Rick James Jun 19 '22 at 18:33
  • @RickJames Right, it's a duplicate. So what is the 'but' for? (RhetoricaL.) If they want a more specific answer they should ask a more specific question. I don't know what you are trying to say by 'ignores "inner" and "outer"'. – philipxy Jun 19 '22 at 20:10
  • What is "theoretical performance"? (The result the answer has to give? If for some model of performance, which one, and why?) What is "the main table"? What is "should"? In order for what? How is applying the definition of left join not telling you? [How much research effort is expected?](https://meta.stackoverflow.com/q/261592/3404097) [mre] PS Once you are clear, if you don't explain what you expect but also why you expect it then you are just asking us to (re)write yet another presentation with no idea what you now misunderstand or do or don't understand. [ask] [Help] – philipxy Jun 19 '22 at 20:32

2 Answers2

2

The name of the exercise tells yo what is the first table in your case. "List all category names with the number of their products." So get the all category names. Category names is what you HAVE TO SHOW - ALL OF THEM. You want to show all of them regardless of the fact is there a matching CategoryID in the Products table.

For example, if you want to show all product names with number of their categories then you want to show all product names regardless if there exists matching ProductID in Categories table.

Here is the demo

This demo shows you what the two queries will return if we have 3 categories and one product. It is not the best demo in the world but it does the trick I believe.

The tables:

create table Categories (CategoryID int, CategoryName varchar(20))

create table Products (ProductID int, CategoryID int)

The data:

insert into Categories values(1, 'Cat1');
insert into Categories values(2, 'Cat2');
insert into Categories values(3, 'Cat3');
insert into Products values(1, 1);

Query1:

SELECT C.CategoryID, CategoryName, COUNT(ProductID) as Cnt
FROM Categories C 
LEFT JOIN Products P ON C.CategoryID = P.CategoryID
GROUP BY C.CategoryID, CategoryName

Result1:

CategoryID  CategoryName    Cnt
1           Cat1            1
2           Cat2            0
3           Cat3            0

Query2:

SELECT P.CategoryID, CategoryName, COUNT(ProductID) as Cnt
FROM Products P 
LEFT JOIN Categories C ON P.CategoryID = C.CategoryID
GROUP BY CategoryName, P.CategoryID

Result2:

CategoryID  CategoryName    Cnt
1           Cat1            1

I see in your question that you say: "Used were two tables from the northwind DB: products (77 rows) and categories (8 rows)" So maybe it is strange now for you how can my example be like this and yours "since the results of both queries are obviousely the same" ?

Here is the demo that will show you how it can be the same with different set of data.

VBoka
  • 8,995
  • 3
  • 16
  • 24
  • the question was: is there a theoretical difference in performance? (since the results of both queries are obviousely the same) – Zos Jun 10 '22 at 09:16
  • 5
    The result is not the same , maybe it is the same for your data. – VBoka Jun 10 '22 at 09:16
  • 2
    @Zos if you have a category without product, your query will not show it has 0 product. – Blag Jun 10 '22 at 09:18
  • @VBoka the way i thought about it (i am VERY new to sql) was that i either: -get 8 category names from the category-id and then (8 times) jump to the product table to get the number products. -get the product numer of every category-id and then (8 times) jump to the categories table to get the category names – Zos Jun 10 '22 at 10:00
  • @Zos Please clarify via edits, not comments. Please delete/flag obsolete comments. PS The queries do not define the same function of inputs. – philipxy Jun 19 '22 at 20:31
0

As an aside, here is another way to get the desired results.

SELECT C.CategoryID, C.CategoryName
       ( SELECT COUNT(*) 
            FROM Products AS P
            WHERE P.CategoryID = c.CategoryID
       ) AS "Count"
   FROM Categories AS C

The performance will be about the same as the 'correct' LEFT JOIN formulation.

A further note: COUNT(x) does the extra check to see that x IS NOT NULL; COUNT(*) simply counts the number of relevant rows.

In some other situation, you may need COUNT(DISTINCT productID); I suspect you do not need it in this case.

Rick James
  • 135,179
  • 13
  • 127
  • 222