0

I'm working with left joins that are creating NULL values. Let's say I'm interested in how many users have a basic plan or no plan at all (indicated by a NULL record in the left join)

SELECT u.id, s.plan_type 
FROM users as u
LEFT JOIN subscriptions as s on s.user_id = u.id
WHERE s.plan_type IN (NULL, 'basic')

The above does not work as intended, likely because it doesn't recognize it as NULL. I'd have to write something like WHERE s.plan_type IS NULL OR s.plan_type = 'basic'

I was wondering if there was a short hand for that, or if you always have to use IS NULL to identify NULL values

Nicholas Hassan
  • 949
  • 2
  • 10
  • 27
  • 2
    In a few words, no, you have to do the `OR` what you said. Why is it that bad? Maybe the explanation is that `NULL` value cannot be compared with equal operator (e.g. `NULL = NULL` return false), and `IN` operator likely performs an equal comparison for each element – James Aug 31 '21 at 23:04
  • Not sure how it works in POSTGRES but one option would be to use a coalesce... WHERE coalesce(s.plan_type, '-1') IN ('-1', 'basic'). Essentially, replace NULL with -1, then you can check for -1. – Aron Aug 31 '21 at 23:15
  • @JaimeDrq One reason it is bad is that an IN list can make more efficient use of indexes than an OR currently can. (Maybe we will fix that some day) – jjanes Sep 01 '21 at 13:53
  • I trust you @jjanes, but being honest I am not fully confident about that – James Sep 01 '21 at 15:38

2 Answers2

3

s.plan_type IN (NULL, 'basic') is just shorthand for (s.plan_type = NULL OR s.plan_type = 'basic').

Anything = NULL always returns NULL, not true. And only true qualifies in a WHERE clause. So rephrase:

SELECT u.id, s.plan_type 
FROM   users u
LEFT   JOIN subscriptions s ON s.user_id = u.id
WHERE (s.plan_type = 'basic' OR
       s.plan_type IS NULL);

To be precise, that's only correct if subscriptions.plan_type is defined NOT NULL. Failing that, the resulting NULL value could be just that, not a missing plan.

This alternative formulation does not share the same ambiguity for a resulting NULL:

SELECT u.id, s.plan_type
FROM   users u
JOIN   subscriptions s ON s.user_id = u.id
WHERE  s.plan_type = 'basic'
UNION ALL
SELECT u.id, NULL  -- always means missing
FROM   users u
WHERE  NOT EXISTS (SELECT FROM subscriptions WHERE user_id = u.id);

And it may be faster, because Postgres (like most RDBMS) often doesn't perform well with OR'ed predicates. See about "ugly OR":

If referential integrity is guaranteed with a FK constraint from subscriptions.user_id to users.id, and s.user_id is defined NOT NULL you can omit the users table from the first SELECT:

SELECT user_id AS id, plan_type
FROM   subscriptions
WHERE  plan_type = 'basic'
UNION ALL
SELECT id, NULL
FROM   users u
WHERE  NOT EXISTS (SELECT FROM subscriptions WHERE user_id = u.id);

Might be fastest.

Aside: No array is involved here. You show the form of IN using a list. See:

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
3

If your table isn't so huge, you could use coalesce to handle null before matching

select u.id, s.plan_type 
from users as u
left join subscriptions as s on s.user_id = u.id
where coalesce(s.plan_type,'basic') = 'basic';

Note that the use of function on the left side of the where clause will make the query unsargable.

Radagast
  • 5,102
  • 3
  • 12
  • 27