2

I have two tables table_a and table_b in a PostgreSQL 13 database, having UUIDs primary key columns.

table_a has multiple entries, whereas table_b is empty (no entry). The following query returns the expected result, namely entry_count_a larger than 0:

SELECT COUNT(DISTINCT ta.uuid) AS entry_count_a FROM table_a ta

However, the following query returns 0 for both entry_counts:

SELECT COUNT(DISTINCT ta.uuid) AS entry_count_a, COUNT(DISTINCT tb.uuid) AS entry_count_b FROM table_a ta, table_b tb

What is the correct way to write the query, so that entry_count_a contains the correct (expected) value > 0, whereas entry_count_b is 0?

Bonus question: Why does Postgres behave this way?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
DaHoC
  • 314
  • 1
  • 4
  • 14

5 Answers5

2

Your current query is equivalent to the following one:

SELECT COUNT(DISTINCT ta.uuid) AS entry_count_a, 
       COUNT(DISTINCT tb.uuid) AS entry_count_b 
FROM       table_a ta
CROSS JOIN table_b tb

When you apply the cartesian product between two tables, you multiply their cardinality. You get no rows because one of the two tables has cardinality 0, hence 0*n, for any n, is always 0.

If you want to display the two counts correctly, you could use two subqueries as follows:

SELECT
    (SELECT COUNT(DISTINCT uuid) FROM table_a) AS entry_count_a,
    (SELECT COUNT(DISTINCT uuid) FROM table_b) AS entry_count_b

Check the demo here.

lemon
  • 14,875
  • 6
  • 18
  • 38
2

Your query will produce the cartesian product table a x table b, because you didn't established the way you want to relate them. Normally, we use WHERE conditions or JOIN clauses. So if you have n lines in A x 0 lines in B, you will get 0 rows as result with your SQL statement. The easiest way in this case is running two separate querys, one for each table, or using subselects like this:

SELECT COUNT(DISTINCT ta.uuid) AS entry_count_a, SELECT COUNT(DISTINCT tb.uuid) FROM table_b tb) AS entry_count_b FROM table_a ta
Rui Vieira
  • 125
  • 1
  • 4
2

Count 0 is due to the implicit CROSS JOIN in FROM table_a ta, table_b tb, where one empty table cancels out the other one.

Joining is the wrong approach to begin with, as has been pointed out by other answers already. (Besides the side-effect at hand, building a Cartesian product gets very expensive for big tables. Run two separate counts instead.

Since uuid is the the PK in each table, use count(*):

SELECT (SELECT count(*) FROM table_a) AS count_a
     , (SELECT count(*) FROM table_b) AS count_b
;

count(DISTINCT uuid) would be a big waste, since uuid is UNIQUE by definition.

count(uuid) would still be wasteful, since that column is also NOT NULL by definition. Postgres has a separate, faster implementation for count(*), simply counting rows.

About the misplaced join:

About the faster count(*):

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
1

Workaround I use for now:

SELECT a.count,
       b.count FROM
  (SELECT COUNT(DISTINCT uuid) AS count FROM table_a) a,
  (SELECT COUNT(DISTINCT uuid) AS count FROM table_b) b
DaHoC
  • 314
  • 1
  • 4
  • 14
0

An other way to do it using CROSS JOIN :

with cte_table_a as (
  SELECT COUNT(DISTINCT uuid) AS entry_count_a FROM table_a
),
cte_table_b as (
  SELECT COUNT(DISTINCT uuid) AS entry_count_b FROM table_b
)
select * from cte_table_a
cross join cte_table_b

Or simply :

SELECT * 
from (SELECT COUNT(DISTINCT uuid) AS entry_count_a FROM table_a) a
CROSS JOIN (SELECT COUNT(DISTINCT uuid) AS entry_count_b FROM table_b) b

Demo here

SelVazi
  • 10,028
  • 2
  • 13
  • 29