0

The following query is used to count the number of rows in two unrelated tables in a single query.

  With t1 as (Select 1 
              Union Select  2 
              Union Select 3),
  t2 as (Select 'A' 
         Union Select 'B')

  Select (Select count(*) from t1), (Select count(*) from t2)

Is there a better way to avoid the two select statements in the select query.

The output should be

3 2

Any construct that is specific to Postgres will also do.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Gopal
  • 1,372
  • 2
  • 16
  • 32
  • As soon as they are **unrelated** (as you said) - there is nothing wrong to perform a single query. 1 query instead of 2 is neither better or faster. – zerkms Dec 31 '12 at 07:21
  • Do you need it to work for mysql or postgres? Cause I don't think MySQL can do anything even similar to "with as (select...)"
    – Andreas Wederbrand Dec 31 '12 at 07:24
  • @zerkms: Just want to know the alternative approaches available for this. – Gopal Dec 31 '12 at 07:30
  • @Andreas: Thanks for mentioning. Removed the MySQL tag. It's for postgres only. – Gopal Dec 31 '12 at 07:30
  • 2
    @Gopal: there is no reason for that. Tables are unrelated. Perform 2 queries – zerkms Dec 31 '12 at 07:35
  • @zerkms: Why should he perform 2 queries? In case he wants the result in one row, he'd better perform one single queries (containint two count-selects, of course) – alzaimar Dec 31 '12 at 09:14
  • @alzaimar: why "better"? Don't perform premature optimizations. At first - write the readable and easy to understand code. Only good and straightforward code is maintainable and may be optimized later (if there is a technical reason for that). Keep in mind that in most cases each optimization leads to tricky code. – zerkms Dec 31 '12 at 09:15
  • 2
    @zerkms: At first, it's not "premature" but his own suggestion. Second, my comment includes a conditional sentence: IF he wants them in ONE line, he'd better (=MUST) execute one query. Third: There are three solutions to what he wants to achieve: the one he posted or a UNION of both counts. These return one result set. The third solution would return two result sets, and just executes the two counts after another. As long as we don't know exactly how the result is handled, all three possibilities are equal in terms of complexity, aren't they? – alzaimar Dec 31 '12 at 09:23
  • @alzaimar: Thanks for your understanding. I also have control over how the two tables are structured. Hence, another possible solution is to add another column in both the tables as row number, and do an full outer join on that column, which will give me the result that I want. – Gopal Dec 31 '12 at 09:38
  • @Gopal: The count from a `FULL OUTER JOIN` is something *completely* different from two individual counts. You are aware of that? – Erwin Brandstetter Dec 31 '12 at 11:38
  • @ErwinBrandstetter: Yes, I am aware about it. I want a query that just displays the content of two tables individually, to find their count. To make my intent explicit, I have updated the column to have non-overlapping values. – Gopal Dec 31 '12 at 12:30

3 Answers3

3

Simple and correct

First of all, you can simplify your test case with a VALUES expression instead of the more verbose UNION ALL SELECT.
You'd need explicit type casts in the first row, if data types are not the default integer and text ..

Second, a FULL OUTER JOIN is utterly pointless. All it does is make your query slower. And if any row has more than one match in the other table, it gets multiplied in the count.

WITH t1(col1, col2) AS (VALUES (1, 1),   (2, 2),   (3, 3))
    ,t2(col1, col2) AS (VALUES (1, 'A'), (2, 'B'), (2, 'C'))  -- 2nd row for "2"
SELECT count(t1.*), count(t2.*)
FROM t1
FULL OUTER JOIN t2 USING (col1);

Yields:

4   3

which is wrong.

WITH t1(col2) AS (VALUES (1),   (2),   ( 3))
    ,t2(col2) AS (VALUES ('A'), ('B'), ('C'))
SELECT (SELECT count(*) FROM t1) AS t1_ct
      ,(SELECT count(*) FROM t1) AS t2_ct;

Yields:

3   3

which is correct, besides being simpler and faster.
Admittedly, with row_number() freshly applied, there can be no dupes. But it's just a waste of time.

Performance

Counting is relatively slow for big tables. If you don't need an exact count but can live with an estimate, you can get this extremely fast:

SELECT reltuples::bigint AS estimate
FROM   pg_class
WHERE  oid = 'myschema.mytable'::regclass;

I quote the manual here:

It is updated by VACUUM, ANALYZE, and a few DDL commands such as CREATE INDEX.

More details in this related answer.

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Hi Erwin, Thanks for your detailed reply. VALUES expression is awesome cool and useful. Further, the col1 in both the tables are generated only for the purpose of linking the two tables, and hence there will be no duplicates in that column. Also, counting individual tables is mentioned in the question itself. The expectation is to come up with an alternative approach. Your suggestion to find the estimate count is awesome. In my case, however, I need to find the exact count. In all, thanks again for the detailed response and it is very informative. – Gopal Dec 31 '12 at 12:57
  • 3
    @Gopal: There is *no* superior alternative to counting both tables individually. – Erwin Brandstetter Dec 31 '12 at 13:29
0

Counting is an extremely expensive operation (in terms of CPU load). Try to avoid whenever possible. If you need to get the total number of rows of a table without any condition, some RDBMS offer a workaround, e.g. with MSSQL, it's like this:

select SUM(row_count) as Total_Rows
 from sys.dm_db_partition_stats
where object_name(object_id) = 'YourTableName' 
  and index_id < 2

An alternative could be to maintain your count in a separate table, e.g. if you need the total number grouped by a certain value. You would increase and decrease the counts using a trigger. This is recommended if you (e.g.) have to show a count on the main form all the time (active users, active posts per area etc.

alzaimar
  • 4,572
  • 1
  • 16
  • 30
  • Thanks alzaimar. The limitations of this solution for my problem 1. the tables are not persistent. (they are just CTEs) and 2. creating a new table for count and maintaining trigger is over engineering for my case, as the load will not be so huge in near future. – Gopal Dec 31 '12 at 12:47
  • No problem. Have a happy new year. – alzaimar Dec 31 '12 at 12:57
  • Thanks alzaimar and wish you the same. – Gopal Dec 31 '12 at 12:58
-2

Introduce relationship between the two tables by adding a row_number column and do a full outer join.

  With t1 as (Select 1 as Col1, 1 
              Union Select  2, 2 
              Union Select 3, 3),
  t2 as (Select -1 as Col1, 'A' 
         Union Select -2, 'B')

  Select count(t1.*), count(t2.*) from t1 full outer join t2 on t1.Col1 = t2.Col1
Gopal
  • 1,372
  • 2
  • 16
  • 32
  • In case the column `Col1` is an artificial connection between the two tables, I do not recommend this solution. It introduces an completely unnecessary join operation. Stick to your solution, it's completely OK. – alzaimar Dec 31 '12 at 11:22
  • I want to add: Stick to original solution using `select select (count(*)..., (select count(*)...)` – alzaimar Dec 31 '12 at 11:32
  • Hi alzaimar, there is a reason why I go for a single query, as I need to add few predicates in my original problem. If I keep two separate queries, then I need to duplicate those predicates in both the tables. Though the problem statement says that the tables are unrelated, they are not completely so. Just to simplify the case, I mentioned this example. Anyway, thanks for your suggestion. – Gopal Dec 31 '12 at 13:01