2

I'm trying to get the unique values from two tables who both just have one column called domain.

DDL:

create table domains_1 (domain varchar);
create table domains_2 (domain varchar);

DML:

insert into domains_1 values ('example_1.com'), ('example_2.com');
insert into domains_2 values ('example_2.com'), ('example_3.com');

There are a few ways to do this and I decide to use full outer join.

select case when a.domain is null then b.domain
            when b.domain is null then a.domain
       end as unique_domains
from domains_1 as a full outer join domains_2 as b on a.domain = b.domain;

To my surprise, there is null in the result besides the unique domains.

enter image description here

I can add one more layer of select to exclude the null as this:

select * from
(select case when a.domain is null then b.domain
            when b.domain is null then a.domain
       end as unique_domains
from domains_1 as a full outer join domains_2 as b on a.domain = b.domain) t
where unique_domains is not null;

How can this null be present in the result in the first place? Is there any better way to remove the null from the result?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
JamesWang
  • 1,175
  • 3
  • 14
  • 32
  • No. I'm very clear of outer join on this part, but my question is how to find a better way to exclude the null which corresponds to the matching case. – JamesWang Mar 08 '21 at 00:57
  • Why should it be present? If you don't say why then you're just asking for yet another definition of the parts of the language you are using while giving us no idea what you don't understand. Also what is the 1st subexpression that doesn't return what you expect (and why)? Also "the unique values from two tables" is vague. (But find out what you misunderstand before you go on to a query for your overall goal.) – philipxy Mar 08 '21 at 06:18
  • @TimothyG. That is not a correct definition of what full outer join returns. Observe that the sadly highly-upvoted accepted answer you quote from besides being extremely poor does not even address the general case. (The only clear complete definition on that page is mine.) (FULL JOIN ON returns INNER JOIN ON rows UNION ALL unmatched left & right table rows extended by NULLs. ) – philipxy Mar 08 '21 at 06:30

2 Answers2

1

Your CASE expression has no ELSE, so it defaults to null:

case when a.domain is null then b.domain
     when b.domain is null then a.domain
     ELSE NULL -- implicitly
end as unique_domains

The value 'example_2.com' has a match so both a.domain and b.domain equal ''example_2.com'' and are not null. So, both WHEN don't match and ELSE NULL is applied.

As to "a better way": I'd probably use

select coalesce(a.domain, b.domain) as domain
from domains_1 as a full outer join domains_2 as b on a.domain = b.domain
where a.domain is null or b.domain is null;
Thorsten Kettner
  • 89,309
  • 7
  • 49
  • 73
  • is there any way that I can exclude this `null` from the result? that's my ultimate goal. Thanks for pointing out the `else` case. – JamesWang Mar 08 '21 at 00:54
  • It seems you want to show domains that only exist in one of the tables, not in both. This is what my query does. Yours creates a null for matches, mine excludes matches. – Thorsten Kettner Mar 08 '21 at 00:59
  • got it. I have accepted your answer. thanks. – JamesWang Mar 08 '21 at 01:00
0

A CASE expression in the SELECT list cannot remove rows (like you seem to desire). That has to happen in a JOIN or WHERE clause.

Since your column names conveniently align, work with the USING keyword in the join clause to simplify the job.

To get "unique domains" (incl. 'example_2.com' in your example):

SELECT domain
FROM   domains_1
FULL   JOIN domains_2 USING (domain);

To get domains without match in other table respectively (excl. 'example_2.com' in your example):

SELECT domain
FROM   domains_1 a
FULL   JOIN domains_2 b USING (domain)
WHERE  a.domain IS NULL OR b.domain IS NULL;

db<>fiddle here

The manual:

[...] USING implies that only one of each pair of equivalent columns will be included in the join output, not both.

But you can still reference each source column by table-qualifying it, as demonstrated.

There are various other query techniques to eliminate rows with a match in another table:

Notably, neither of the above queries removes duplicates within each table unless there is a match in the other table.

A fancy equivalent for the second query, but without possible duplicates within each table:

(TABLE domains_1 EXCEPT TABLE domains_2)
UNION ALL
(TABLE domains_2 EXCEPT TABLE domains_1);

This variant only kills one duplicate for every match in the other table, before removing remaining duplicates in the result. Subtly different, yet:

(TABLE domains_1 EXCEPT ALL TABLE domains_2)
UNION
(TABLE domains_2 EXCEPT ALL TABLE domains_1);

About the short syntax:

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228