5

In Postgres, is there a way to perform a left join between tables linked by a junction table, with some filtering on the linked table?

Say, I have two tables, humans and pets, and I want to perform a query where I have the human ID, and the pet name. If the human ID exists, but they don't have a pet with that name, I still want the human's row to be returned.

If I had a FK relationship from pets to humans, this would work:

select h.*, p.*
from humans as h
left join pets as p on p.human_id = h.id and p.name = 'fluffy'
where h.id = 13

and I'd get a row with human 13's details, and fluffy's values. In addition, if human 13 didn't have a pet named 'fluffy', I'd get a row with human 13's values, and empty values for the pet's columns.

BUT, I don't have a direct FK relationship, I have a junction table between humans and pets, so I'm trying a query like:

select h.*, p.*
from humans as h
left join humans_pets_junction as j on j.human_id = h.id
left join pets as p on j.pet_id = p.id and p.name = 'fluffy'
where h.id = 13

Which returns rows for all of human 13's pets, with empty columns except for fluffy's row.

If I add p.name = 'fluffy' to the WHERE clause, that filters out all the empty rows, but also means I get 0 rows if human 13 doesn't have a pet named fluffy at all.

Is there a way to replicate the behavior of the FK-style left join, but when used with a junction table?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Jordan0Day
  • 1,386
  • 4
  • 15
  • 25

2 Answers2

4

One method is to do the comparison in the where clause:

select h.*, p.*
from humans as h left join
     humans_pets_junction as j
     on j.human_id = h.id left join
     pets as p
     on j.pet_id = p.id and p.name = 'fluffy'
where h.id = 13 and (p.name = 'fluffy' or p.id is null);

Alternatively, join the junction table and the pets table as a subquery or CTE:

select h.*, p.*
from humans h left join
     (select j.*
      from humans_pets_junction j join
           pets p
           on j.pet_id = p.id and p.name = 'fluffy'
     ) pj
     on pj.human_id = h.id
where h.id = 13;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • The 1st query has a built-in trap for setups where `pets.name` can be null. The expression `p.name is null` *after* the `LEFT JOIN` cannot tell cases where a human has no associated pet from cases where the pet's name is missing (null). Use `.. or p.id is null)` to fix. No disadvantage. Also, both queries copy the invalid reference `r.*` from the question. – Erwin Brandstetter Feb 24 '15 at 00:54
  • @ErwinBrandstetter . . . That's true. I changed it to the `id`. – Gordon Linoff Feb 24 '15 at 01:00
  • Thanks for this answer. I like the first form better, but I can't seem to get it to behave similarly to my foreign key example above (that is, if human 13 doesn't have a pet named 'fluffy', now rows are returned, rather than a row with the human columns and no pet columns). The subquery form works, though. – Jordan0Day Feb 24 '15 at 15:33
  • Err, that should read "no rows are returned". I wish SO wouldn't have such silly restrictions like not being able to edit comments after five minutes. – Jordan0Day Feb 24 '15 at 15:42
1

In Postgres you can use parentheses to prioritize JOIN order. You do not need a subquery:

SELECT h.*, p.id AS p_id, p.name AS pet_name
FROM   humans  h
LEFT   JOIN (pets p
       JOIN  humans_pets_junction j ON p.name = 'fluffy'
                                   AND j.pet_id = p.id
                                   AND j.human_id = 13) ON TRUE
WHERE  h.id = 13;

Per documentation:

Parentheses can be used around JOIN clauses to control the join order. In the absence of parentheses, JOIN clauses nest left-to-right.

I added the predicate j.human_id = 13 to the join between your junction table and the pets to eliminate irrelevant rows at the earliest opportunity. The outer LEFT JOIN only needs the dummy condition ON TRUE.

SQL Fiddle.

Aside 1: I assume you are aware that you have a textbook implementation of a n:m (many-to-many) relationship?

Aside 2: The unfortunate naming convention in the example makes it necessary to deal out column aliases. Don't use "id" and "name" as column names in your actual tables to avoid such conflicts. Use proper names like "pet_id", "human_id" etc.

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228