0

I'm trying to understand why the following query only returns one row in spite of LEFT JOIN

with t1(day_partition, entity_id, feature_1) AS (values
  ('2020-05-15', 'id_1', 'x'),
  ('2020-05-15', 'id_2', 'y')
),

t2(day_partition, entity_id, feature_2) AS (values
  ('2020-05-15', 'id_1', 1)
)

SELECT
  t1.day_partition AS day_partition_1,
  t2.day_partition AS day_partition_2,
  t1.entity_id AS entity_id_1,
  t2.entity_id AS entity_id_2
FROM
  t1
LEFT JOIN
  t2
ON
  t1.entity_id = t2.entity_id
WHERE
  t2.day_partition = '2020-05-15'
;

returns

 day_partition_1 | day_partition_2 | entity_id_1 | entity_id_2 
-----------------+-----------------+-------------+-------------
 2020-05-15      | 2020-05-15      | o1          | o1          

However, removing the filter of

WHERE
  t2.day_partition = '2020-05-15'

will return

 day_partition_1 | day_partition_2 | entity_id_1 | entity_id_2 
-----------------+-----------------+-------------+-------------
 2020-05-15      | 2020-05-15      | id_1        | id_1        
 2020-05-15      | NULL            | id_2        | NULL        

I find such behavior unintuitive and what's the rule behind it?

GMB
  • 216,147
  • 25
  • 84
  • 135
zyxue
  • 7,904
  • 5
  • 48
  • 74

2 Answers2

1

This is by design. Conditions in the where clause are mandatory so putting there a condition on the left joined table ends up evicting rows where the left join came back empty. Basically, this turns the left join to an inner join.

You need to put all predicates that relate to columns coming from the left joined table in the on clause of the join:

FROM t1
LEFT JOIN t2
    ON t1.entity_id = t2.entity_id
    AND t2.day_partition = '2020-05-15'

By looking at your resultset, I tend to think that you actually wanted a condition on t1:

FROM t1
LEFT JOIN t2
    ON t1.entity_id = t2.entity_id
WHERE t1.day_partition = '2020-05-15'
GMB
  • 216,147
  • 25
  • 84
  • 135
0

If you put the condition in the join it works as you expect

SELECT
  t1.day_partition AS day_partition_1,
  t2.day_partition AS day_partition_2,
  t1.entity_id AS entity_id_1,
  t2.entity_id AS entity_id_2
FROM
  t1
LEFT JOIN
  t2
ON
  t1.entity_id = t2.entity_id AND t2.day_partition = '2020-05-15'

The query parser does not know what you are thinking. If you filter the data in a where clause it affects all records and not just the records of the joined table.

juergen d
  • 201,996
  • 37
  • 293
  • 362
  • could you please clarify more what do you mean by "The query parser does not know what you are thinking"? – zyxue May 29 '20 at 04:32
  • You asked what kind of sense your query makes. You want to filter only records of the joined table. But the database engine does not know your intentions. It just fitlers all you add in the `where` clause – juergen d May 29 '20 at 04:38