42

Using PostgreSQL, I have a number of queries that look like this:

SELECT <col 1>, <col 2>
     , (SELECT sum(<col x>)
        FROM   <otherTable> 
        WHERE  <other table foreignkeyCol>=<this table keycol>) AS <col 3>
FROM   <tbl>

Given that the sub-select will be identical in every case, is there a way to store that sub-select as a pseudo-column in the table? Essentially, I want to be able to select a column from table A that is a sum of a specific column from table B where the records are related. Is this possible?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
ibrewster
  • 3,482
  • 5
  • 42
  • 54

4 Answers4

92

Is there a way to store that sub-select as a pseudo-column in the table?

A VIEW like has been advised is a perfectly valid solution. Go for it.

But another way fits your question even more closely: a "computed field" or "generated column". STORED generated columns have been added with Postgres 11. See:

For older versions, or if you prefer VIRTUAL generated columns (not yet implemented, up to Postgres 15), you emulate the functionality with a function that takes the table type as parameter.

Consider this test case, derived from your description:

CREATE TABLE tbl_a (a_id int, col1 int, col2 int);
INSERT INTO tbl_a VALUES (1,1,1), (2,2,2), (3,3,3), (4,4,4);

CREATE TABLE tbl_b (b_id int, a_id int, colx int);
INSERT INTO tbl_b VALUES
  (1,1,5),  (2,1,5),  (3,1,1)
, (4,2,8),  (5,2,8),  (6,2,6)
, (7,3,11), (8,3,11), (9,3,11);

Create function that emulates col3:

CREATE FUNCTION col3(tbl_a)
  RETURNS int8
  LANGUAGE sql STABLE AS
$func$
SELECT sum(colx)
FROM   tbl_b b
WHERE  b.a_id = $1.a_id
$func$;

Now you can query:

SELECT a_id, col1, col2, tbl_a.col3
FROM   tbl_a;

Or even:

SELECT *, a.col3 FROM tbl_a a;

Note how I wrote tbl_a.col3 / a.col3, not just col3. This is essential.

Unlike a "virtual column" in Oracle it is not included automatically in a SELECT * FROM tbl_a. You could use a VIEW for that.

Why does this work?

The common way to reference a table column is with attribute notation:

SELECT tbl_a.col1 FROM tbl_a;

The common way to call a function is with functional notation:

SELECT col3(tbl_a);

Generally, it's best to stick to these canonical ways, which agree with the SQL standard.

But Postgres also allows attribute notation. These work as well:

SELECT col1(tbl_a) FROM tbl_a;
SELECT tbl_a.col3;

More about that in the manual.
You probably see by now, where this is going. This looks like you would add an extra column of table tbl_a while col3() is actually a function that takes the current row of tbl_a (or its alias) as row type argument and computes a value.

SELECT *, a.col3
FROM   tbl_a AS a;

If there is an actual column col3 it takes priority and the system does not look for a function of that name taking the row tbl_a as parameter.

The "beauty" of it: you can add or drop columns from tbl_a and the last query will dynamically return all current columns, where a view would only return such columns that existed at creation time (early binding vs. late binding of *).
Of course, you have to drop the depending function before you can drop the table now. And you have to take care not to invalidate the function when making changes to the table.

I still wouldn't use it. It's too surprising to the innocent reader.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • 9
    The function should really be marked `STABLE` (meaning that if you call it with the same arguments more than once during the execution of a single query it will return the same value), not `IMMUTABLE` (meaning that it will *always* return the same value for the same arguments, regardless of database contents or the passage of time). One thing this will do is to prevent usage of the function in an index, which you would want because otherwise the index would become corrupted if changes are made to tbl_b. `IMMUTABLE` would be OK if you were only using values from the row passed as a parameter. – kgrittn Jun 23 '12 at 16:04
  • @kgrittn: Of course, I amended the function declaration accordingly. Forgot that another table is involved. – Erwin Brandstetter Jun 23 '12 at 16:10
  • @ErwinBrandstetter - but if you do a \d tbl_a; or a SELECT * FROM tbl_a;, you can't see the "persisted/computed/calculated/generated" column? Also, I would be grateful if you could check out my answer [here](http://dba.stackexchange.com/a/136686/34007) - maybe I'm not using your code correctly? Or is this a bug in PostgreSQL? – Vérace Apr 27 '16 at 03:15
  • @Vérace: The function is not automatically part of the table, you have to spell it out like instructed. I clarified some more. – Erwin Brandstetter Apr 27 '16 at 12:15
  • @ErwinBrandstetter, I get a field does not exist error with such field notation... https://stackoverflow.com/questions/56816432/pgsql-calling-function-with-column-notation-error – Leo Jun 29 '19 at 09:33
3

Apparently this is handled with views, as per lion's comment. So in my case, I used the command:

CREATE VIEW <viewname> AS
SELECT *, (SELECT sum(<col x>)
FROM   <otherTable
WHERE  <otherTable foreignkeyCol>=<thisTable keycol>) AS <col 3>
FROM   <tablename>

which essentially gives me another table including the desired column.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
ibrewster
  • 3,482
  • 5
  • 42
  • 54
3

Apart from a view, you can create a function for the sum.

CREATE FUNCTION sum_other_table( key type_of_key ) RETURNS bigint
AS $$ SELECT sum( col_x ) FROM table_1 where table_1.key = key $$ LANGUAGE SQL;

and then use it as your aggregator:

SELECT col_1, col_2, sum_other_table( key ) AS col_3
FROM table_2 WHERE table_2.key = key;

Note that the return type of sum_other_table() depends on the type of the column you're summing up.

Johann Oskarsson
  • 776
  • 6
  • 15
3

There are three answers so far, all of which work. Any one of them could be a "best solution" depending on circumstances. With small tables the performance should be pretty close, but none of them are likely to scale well to tables with millions of rows. The fastest way to get the desired results with a large data set would probably be (using Erwin's setup):

SELECT a_id, col1, col2, sum(colx)
FROM tbl_a LEFT JOIN tbl_b b using(a_id)
GROUP BY a_id, col1, col2;

If a_id is declared as a primary key, and this is run under 9.1 or later, the GROUP BY clause can be simplified because col1 and col2 are functionally dependent on a_id.

SELECT a_id, col1, col2, sum(colx)
FROM tbl_a LEFT JOIN tbl_b b using(a_id)
GROUP BY a_id;

The view could be defined this way and it would scale, but I don't think that all the same execution paths will be considered for the approaches using functions, so the fastest execution path might not be used.

kgrittn
  • 18,113
  • 3
  • 39
  • 47
  • 1
    Simple functions in LANGUAGE SQL will be inlined and the query optimizer may come up with the best solution anyway. Whether that happens in this particular case is another question. – Johann Oskarsson Jul 11 '12 at 08:59
  • 1
    @JohannOskarsson: Even if the function is inlined, my version results in a *correlated subquery* instead of a *join*. I doubt the optimizer dares to transform that. – Erwin Brandstetter Jan 11 '13 at 15:17