1

I've been using PosgreSQL almost daily for over 11 years now, and today I wrote what I though was a very simple query with a LEFT JOIN that doesn't behave the way that I expected. I'm lucky I caught the bug, but it has me concerned that there is something fundamental here that I a missing. Please look at the following to be able reproduce.

CREATE TEMP TABLE tbl_a(date date);
INSERT INTO tbl_a VALUES ('2022-01-01'), ('2022-01-02'), ('2022-01-03'), ('2022-01-04');
CREATE TEMP TABLE sale(date date, item_id int);
INSERT INTO sale VALUES ('2022-01-02', 2), ('2022-01-03', 2), ('2022-01-04', 3);

When I run the following query I get the results I expect with a LEFT JOIN

SELECT t.*, s.item_id FROM tbl_a AS t LEFT JOIN sale AS s ON t.date = s.date;
+------------+---------+
|    date    | item_id |
+------------+---------+
| 2022-01-01 |    NULL |
| 2022-01-02 |       2 |
| 2022-01-03 |       2 |
| 2022-01-04 |       3 |
+------------+---------+

I get every record in tbl_a and since I have no sale records for 2022-01-01, I get a NULL.

However, when I add a WHERE to the query I get an unexpected result.

SELECT t.*, s.item_id FROM tbl_a AS t LEFT JOIN sale AS s ON t.date = s.date WHERE s.item_id = 2;
+------------+---------+
|    date    | item_id |
+------------+---------+
| 2022-01-02 |       2 |
| 2022-01-03 |       2 |
+------------+---------+

Note: there is no record for 2022-01-01 or 2022-01-04.

However, if I rewrite the query with a CTE, I get the results I expect.

WITH s AS (select * from sale WHERE item_id = 2) SELECT t.*, s.item_id FROM tbl_a AS t LEFT JOIN s ON t.date = s.date ORDER BY t.date;
+------------+---------+
|    date    | item_id |
+------------+---------+
| 2022-01-01 |    NULL |
| 2022-01-02 |       2 |
| 2022-01-03 |       2 |
| 2022-01-04 |    NULL |
+------------+---------+

My question is why do the above two queries yield different results.

Note:

SELECT version();
+-----------------------------------------------------------------------------------------------------------------------------------+
|                                                              version                                                              |
+-----------------------------------------------------------------------------------------------------------------------------------+
| PostgreSQL 13.7 (Ubuntu 13.7-1.pgdg20.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, 64-bit |
+-----------------------------------------------------------------------------------------------------------------------------------+
David S
  • 12,967
  • 12
  • 55
  • 93
  • 1
    LEFT JOIN returns INNER JOIN rows UNION ALL unmatched left table rows extended by NULLs. Always know what INNER JOIN you want as part of an OUTER JOIN. After a LEFT JOIN a WHERE, INNER JOIN or HAVING that requires a right [sic] table column to be not NULL removes any rows with introduced NULLs, ie leaves only INNER JOIN rows, ie "turns OUTER JOIN into INNER JOIN". You have that. PS This is a faq. But one must write many clear, concise & precise phrasings of one's question/problem/goal to search. – philipxy May 25 '22 at 18:07
  • @philipxy I always try to write very clear and concise questions and include version info and working examples. As you can see, like you I'm a long time user with over 10K in rep points. But, I'm always open to suggestions on how I can be a better community member. How do you feel like I could have made this question better. I end with the question that why do these return different results. IMO, they seem equivalent. Thank you in advance for your help. – David S May 25 '22 at 18:24
  • Your rep & time surprised me because this post has no [mre] explanation & no "concise & precise phrasings of one's question/problem/goal" & (nevertheless) googling the title gives immediate solutions. So I commented to help you debug, search & ask in the future. PS If you don't say why you expect what you expect then you are just asking for yet another presentation of the language with no details about what you already misunderstand or don't or do understand. And if you don't include justification referencing authoritative documentation then research is lacking. PS 2 standard comments follow. – philipxy May 25 '22 at 18:55
  • Please in code questions give a [mre]--cut & paste & runnable code & example input; desired & actual output (including verbatim error messages); tags & versions; clear specification & explanation. For SQL include DDL & tabular initialization code. For debug that includes the least code you can give that is code that you show is OK extended by code that you show is not OK. [ask] [Help] When you get a result you don't expect, pause your overall goal, chop to the 1st subexpression with unexpected result & say what you expected & why, justified by documentation. (Debugging fundamental.) – philipxy May 25 '22 at 18:56
  • Please before considering posting read the manual/reference & google any error message & many clear, concise & precise phrasings of your question/problem/goal, with & without your particular names/strings/numbers, 'site:stackoverflow.com' & tags; read many answers. Reflect research in posts. SO/SE search is poor & literal & sometimes limited to titles, but read the help. Google re googling/searching, including in Q&A at [meta] & [meta.se]. [How much research effort is expected of Stack Overflow users?](https://meta.stackoverflow.com/q/261592/3404097) [ask] [Help] – philipxy May 25 '22 at 18:56
  • @philipxy I do not know if you were the one that marked the question as a duplicated one. The problem dosen't have any relation with the outer join one. The problem here is in the common table expression. – Felipe Colussi-oliva May 25 '22 at 19:53
  • 1
    @FelipeColussi-oliva The post is unclear. Seems the 2nd query is wrong compared to the 1st for the reasons in my comment & the duplicate. The asker agreed it was a duplicate, that's what the community bot vote indicates. I don't know what you mean by the problem being in the CTE. The asker says the 1st & the CTE give what they expect. It doesn't say they should give the same thing. Presumably they mean, each gives what they expect of it. I agree the final question about "above 2 queries" is confusing. The asker should edit to clarify exactly what they're asking re exactly what queries. – philipxy May 25 '22 at 20:03

1 Answers1

1

Thats due to the order of execution from postgres.

Whenever you run the 1st query you are joining both tables then filtering it with the where item_id = 2).

In the second query you are filtering tbl_a then joining the result with b.

The equivalent of the 1st query would be something like:

WITH s AS 
    (select * from sale)
SELECT t.*, s.item_id 
FROM tbl_a AS t 
LEFT JOIN s ON t.date = s.date 
WHERE s.item_id = 2
ORDER BY t.date;