2

Since Postgres doesn't appear to have the functionality of multiple resultsets per UDF(SQL Server does). What is the best way to return query-wide-singleton values? For example when paginating the results of a FTS search, it simplifies logic if we could get the amount of results that match the query (without having to iterate over all the pages using pagination).

Approach one: Stick the post_count in the SELECT list. cons: duplication

Approach two: Write a separate UDF to get the post_count cons: multiple UDF calls ( I use this approach, and it doubles the latency for being able to present the first page).

Approach three: Use array for the result set and put the post_count at the top level as a sibling to the result-set cons : much slower - perhaps this is due to the array_agg() function (yes I tried this approach).

So is there a more pragmatic solution to this problem, and if not is there anything in development pipeline to address this?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Hassan Syed
  • 20,075
  • 11
  • 87
  • 171
  • *Please* stop creating the `postgres-9.1` tag. `postgresql-9.1` exists, and is the proper spelling to boot. – Charles Sep 05 '12 at 16:26
  • The problem is that PostgreSQL has *functions* rather than *stored procedures*. The closest you can come in a function is to return a `SETUP refcursor`, but then you still need to have something pass the cursors and display the results; and if that's another function then you're back in the same boat. Developers have been kicking around ideas for how to provide stored procedures (in addition to functions), but there are some technical problems related to transaction management that haven't been sorted out. – kgrittn Sep 05 '12 at 19:56
  • @kgrittn potentially confusing typo; that'd be `SETOF refcursor` not `SETUP refcursor` – Craig Ringer Sep 06 '12 at 00:55
  • I prefer to have the stored proc return `SETOF refcursor` and then `FETCH ALL FROM '` for each returned cursor to get results. – Craig Ringer Sep 06 '12 at 00:56
  • @CraigRinger: Eek. I wish I had caught that typo within the five minute window -- or that you could edit comments. Ah, well, I did indeed mean `SETOF refcursor`. – kgrittn Sep 06 '12 at 02:00

1 Answers1

1

I ran into this problem myself repeatedly and have not found the one solution.

My latest approach was to define the first row of the returned SET to be meta-data. You don't have that one in your list of approaches, yet. The application uses the first row for bookkeeping. Actual data starts with the second row.

The obvious weakness: you have to make do with whatever row definition you have to squeeze in your meta-data. A simple total count will fit into any numeric or string type, though.
And you have to special-case the first row in the application, of course.

This simple example returns rows from a table foo defined as

CREATE TABLE foo (
  foo_id serial PRIMARY KEY
 ,foo    text
 );

Page-size is 20 rows, pre-set per default in the function header:

CREATE OR REPLACE FUNCTION f_paginate(_max_id int, _limit int = 20
                                                 , _offset int = 0)
  RETURNS TABLE(foo_id int, foo text) AS
$BODY$
BEGIN

SELECT INTO foo_id  count(*)::int
FROM   foo f
WHERE  f.foo_id < _max_id; -- get count

RETURN NEXT;               -- use first row for meta-data

RETURN QUERY               -- actual data starts with second row
SELECT f.foo_id, f.foo
FROM   foo f
WHERE  f.foo_id < _max_id
LIMIT  _limit
OFFSET _offset;

END;
$BODY$
  LANGUAGE plpgsql;

Call:

SELECT * FROM f_paginate(100);

Returns:

foo_id | foo
-------+----
86     | <NULL>    <-- first row = meta-data
1      | bar       <-- actual data
2      | baz
... 18 more ...

Obviously this method can save some bandwidth with a higher _limit (page size). With only a few rows, it's hardly worth the overhead.

The alternative approach would be your "Approach one" - add a column redundantly:

CREATE OR REPLACE FUNCTION f_paginate2(_max_id int, _limit int = 20
                                                  , _offset int = 0)
  RETURNS TABLE(foo_id int, foo text, ct bigint) AS
$BODY$
BEGIN

RETURN QUERY
SELECT f.foo_id, f.foo, count(*) OVER ()
FROM   foo f
WHERE  f.foo_id < _max_id
LIMIT  _limit
OFFSET _offset;

END;
$BODY$
  LANGUAGE plpgsql VOLATILE;

Call:

SELECT * FROM f_paginate2(100);

Returns:

foo_id | foo | ct
-------+-----+----
1      | bar | 86
2      | baz | 86
... 18 more ...

Performance is very similar in this simple case. The first query is slightly faster, but probably only because count(*) OVER () slows down the second query. A separate run with just count(*) is faster.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228