22

I have a sql function that does a simple sql select statement:

CREATE OR REPLACE FUNCTION getStuff(param character varying)
  RETURNS SETOF stuff AS
$BODY$
    select *
    from stuff
    where col = $1
$BODY$
  LANGUAGE sql;

For now I am invoking this function like this:

select * from getStuff('hello');

What are my options if I need to order and limit the results with order by and limit clauses?

I guess a query like this:

select * from getStuff('hello') order by col2 limit 100;

would not be very efficient, because all rows from table stuff will be returned by function getStuff and only then ordered and sliced by limit.

But even if I am right, there is no easy way how to pass the order by argument of an sql language function. Only values can be passed, not parts of sql statement.

Another option is to create the function in plpgsql language, where it is possible to construct the query and execute it via EXECUTE. But this is not a very nice approach either.

So, is there any other method of achieving this? Or what option would you choose? Ordering/limiting outside the function, or plpgsql?

I am using postgresql 9.1.

Edit

I modified the CREATE FUNCTION statement like this:

CREATE OR REPLACE FUNCTION getStuff(param character varying, orderby character varying)
  RETURNS SETOF stuff AS
$BODY$
    select t.*
    from stuff t
    where col = $1
    ORDER BY
        CASE WHEN $2 = 'parent' THEN t.parent END,
        CASE WHEN $2 = 'type' THEN t."type" END, 
        CASE WHEN $2 = 'title' THEN t.title END

$BODY$
  LANGUAGE sql;

This throws:

ERROR: CASE types character varying and integer cannot be matched ŘÁDKA 13: WHEN $1 = 'parent' THEN t.parent

The stuff table looks like this:

CREATE TABLE stuff
    (
      id integer serial,
      "type" integer NOT NULL,
      parent integer,
      title character varying(100) NOT NULL,
      description text,
      CONSTRAINT "pkId" PRIMARY KEY (id),
    )

Edit2

I have badly read Dems code. I have corrected it to question. This code is working for me.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
JoshuaBoshi
  • 1,266
  • 1
  • 14
  • 24
  • 1
    Why is using PL/pgSQL and `EXECUTE` not a nice approach? Should not make a big difference in terms of performance and is the only solution I can think of. –  Nov 15 '11 at 16:30
  • Hmm, mainly beacuse of performence which I thought would be very low in comparsion with sql language function, or at least comparable to `select * from getStuff('hello') order by col2 limit 100;` which is nicer to write to me (from point of view of whole app i am building) – JoshuaBoshi Nov 15 '11 at 16:36
  • 1
    using `EXECUTE` will be a bit slower (because of the additional parsing going on), but I doubt you'll be able to measure the difference. –  Nov 15 '11 at 16:37
  • 2
    @JoshuaBoshi: Guessing about performance impact doesn't usually work well. – Mike Sherrill 'Cat Recall' Nov 15 '11 at 16:41
  • 1
    Ok:-) feel free to write an answer and I will accept it, if nobody will come up with other solution :-) – JoshuaBoshi Nov 15 '11 at 16:42
  • @Catcall: of course, I know. I should do some measurement. But in this case, i was quite sure that plpgsql funcions are one grade slower than sql so it not came up my mind to measure something. – JoshuaBoshi Nov 15 '11 at 16:47
  • EXECUTE has an overhead of parsing and compiling, but if postgresql allows parameterised dynamic sql, this may be cachable and so simply by a hash lookup overhead. Such overheads are not really noticable except in very rapidly repeating queries. Also, note that the alternatives have a different overhead - a one size fits all plan. And such a plan may be so extraordinarily inefficiently that it cripples the performance. – MatBailie Nov 15 '11 at 16:59
  • @Dems: So if I understand you correctly, you recommend to use rather `EXECUTE` than the `ORDER BY CASE`? – JoshuaBoshi Nov 15 '11 at 17:34
  • 1
    @JoshuaBoshi - You'd have to test it for your particular cases. Amount of data, allowable combinations of sorting fields, available indexes, fragmentation of data, etc, can all have an impact. Dynamic SQL with EXECUTE will invariably yield a plan that performs equal to or better than a single CASE based expression. But it *feels* messier and so can be harder to maintain. Testing would show the performance differences. I often value maintenance over performance ***if*** the performance differences are not marked. – MatBailie Nov 15 '11 at 17:44

5 Answers5

41

There is nothing wrong with a plpgsql function for anything a little more complex. The only situation where performance can suffer is when a plpgsql function is nested, because the query planner cannot further optimize the contained code in the context of the outer query which may or may not make it slower.
More details in this later answer:

This is much simpler than lots of CASE clauses in a query:

CREATE OR REPLACE FUNCTION get_stuff(_param text, _orderby text, _limit int)
  RETURNS SETOF stuff AS
$func$
BEGIN
   RETURN QUERY EXECUTE '
      SELECT *
      FROM   stuff
      WHERE  col = $1
      ORDER  BY ' || quote_ident(_orderby) || ' ASC
      LIMIT  $2'
   USING _param, _limit;
END
$func$  LANGUAGE plpgsql;

Call:

SELECT * FROM get_stuff('hello', 'col2', 100);

Notes

Use RETURN QUERY EXECUTE to return the results of query in one go.

Use quote_ident() for identifiers to safeguard against SQLi.
Or format() for anything more complex. See:

Pass parameter values with the USING clause to avoid casting, quoting and SQLi once again.

Be careful not to create naming conflicts between parameters and column names. I prefixed parameter names with an underscore (_) in the example. Just my personal preference.

Your second function after the edit cannot work, because you only return parent while the return type is declared SETOF stuff. You can declare any return type you like, but actual return values have to match the declaration. You might want to use RETURNS TABLE for that.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • 1
    Brandstetter: Wow, now you have learned me some plpgsql :-) I have had no idea about `RETURN QUERY EXECUTE` and `USING`. This is really elegant solution, and i have no worries about plpgsql method now. Thank you very much! – JoshuaBoshi Nov 16 '11 at 10:02
  • 1
    how can I add ASC or DESC with it? – Sachin Apr 11 '18 at 06:11
  • 1
    @erwin: I want to do it dynamic as it is done for _orderby. I want to pass ascending or descending dynamically. – Sachin Apr 11 '18 at 12:15
  • @Sachin: Ask a new question with relevant details. Comments are not the place. You can always link to this one if you need the context. There are simple & safe solutions. – Erwin Brandstetter Apr 11 '18 at 12:20
  • 1
    Please check this question. https://stackoverflow.com/questions/49775242/postgresql-dynamic-order-by-and-sorting-with-stored-procedure – Sachin Apr 11 '18 at 12:31
5

If your function is stable (does not modify the database), the query planner will typically inline it. Therefore, doing SELECT * FROM getStuff('x') LIMIT 10 will produce the same query plan as if the limit were inside getStuff().

However, you need to tell PG your function is stable by declaring it as such:

CREATE OR REPLACE FUNCTION getStuff(param varchar)
RETURNS setof STUFF
LANGUAGE SQL
STABLE
AS $$ ... $$;

Now doing EXPLAIN SELECT * FROM getStuff('x') LIMIT 1 should produce the same query plan as writing out the equivalent query would.

The inlining should also work for ORDER BY clauses outside the function. But if you wanted to parameterize the function to determine the order by, you could do it like this to also control the sort direction:

CREATE FUNCTION sort_stuff(sort_col TEXT, sort_dir TEXT DEFAULT 'asc')
RETURNS SETOF stuff
LANGUAGE SQL
STABLE
AS $$
    SELECT *
    FROM stuff
    ORDER BY
      -- Simplified to NULL if not sorting in ascending order.
      CASE WHEN sort_dir = 'asc' THEN
          CASE sort_col
              -- Check for each possible value of sort_col.
              WHEN 'col1' THEN col1
              WHEN 'col2' THEN col2
              WHEN 'col3' THEN col3
              --- etc.
              ELSE NULL
          END
      ELSE
          NULL
      END
      ASC,

      -- Same as before, but for sort_dir = 'desc'
      CASE WHEN sort_dir = 'desc' THEN
          CASE sort_col
              WHEN 'col1' THEN col1
              WHEN 'col2' THEN col2
              WHEN 'col3' THEN col3
              ELSE NULL
          END
      ELSE
          NULL
      END
      DESC
$$;

As long as sort_col and sort_dir are constant within the query, the query planner should be able to simplify the verbose looking query to

SELECT *
FROM stuff
ORDER BY <sort_col> <sort_dir>

which you can verify using EXPLAIN.

dmg
  • 73
  • 1
  • 4
2

As to the ORDER BY you could try something like this:

SELECT
    <column list>
FROM
    Stuff
WHERE
    col1 = $1
ORDER BY
    CASE $2
        WHEN 'col1' THEN col1
        WHEN 'col2' THEN col2
        WHEN 'col3' THEN col3
        ELSE col1  -- Or whatever your default should be
    END

You might have to do some data type conversions so that all of the data types in the CASE result match. Just be careful about converting numerics to strings - you'll have to prepend 0s to make them order correctly. The same goes for date/time values. Order by a format that has year followed by month followed by day, etc.

I've done this in SQL Server, but never in PostgreSQL, and I don't have a copy of PostgreSQL on this machine, so this is untested.

Tom H
  • 46,766
  • 14
  • 87
  • 128
  • To avoid the data type problem... `ORDER BY CASE WHEN $2 = 'a' THEN a END, CASE WHEN $2 = 'b' THEN b END, etc, etc`. But note, this has the same optimisation problem as I mentioned on soulcheck's answer. – MatBailie Nov 15 '11 at 16:56
  • That's not the data type problem that I was referring to. Obvisously $2 will always be the same data type. If column a and column b are different data types though then it might cause some issues. – Tom H Nov 15 '11 at 17:08
  • 1
    The example I gave deals with that by having each field as a separate clause in the ORDER BY. The CASE statements yield `ORDER BY null, null, x, null` (for example), and so results in data type independence. – MatBailie Nov 15 '11 at 17:13
  • Thank you for your answers and comments. I am using Dems's version of `CASE` but postgresql throws: `ERROR: CASE types character varying and integer cannot be matched` when i try to create the function (on position after `THEN` - weird). I will edit the question and add full source... – JoshuaBoshi Nov 15 '11 at 17:20
  • Exactly the issue that I was talking about. You need to make sure that you convert all of the results to the same data type. – Tom H Nov 15 '11 at 17:41
  • Does postgres require the ELSE statement? `ORDER BY (CASE WHEN $2 = 'a' THEN table.a ELSE NULL END), (CASE WHEN $2 = 'b' THEN table.b ELSE NULL END), etc, etc`? – MatBailie Nov 15 '11 at 17:47
  • @TomH Sorry, I have read the dems code badly, and I have not make separate CASE for each "branch". Now it is working and it makes sence :-) – JoshuaBoshi Nov 15 '11 at 17:52
  • @Dems Sorry, see my previous comment here .-X – JoshuaBoshi Nov 15 '11 at 17:53
1

Using Format function Even with ilike fancy operator.

CREATE OR REPLACE FUNCTION get_customer(
      _param text, _orderby text, _limit int)
      RETURNS SETOF customer AS
$func$

BEGIN
   RETURN QUERY EXECUTE format('
      SELECT *
      FROM   customer
      WHERE  first_name ilike ''%%%s%%''
      ORDER  BY  %I DESC
      LIMIT  %L',
      _param, _orderby, _limit );
END
$func$  LANGUAGE plpgsql;

Format reference:https://www.postgresql.org/docs/current/functions-string.html

jian
  • 4,119
  • 1
  • 17
  • 32
0

You can pass limit value as a function argument without any problems. As for ordering you can use ODER BY in combination with CASE statement. This unfortunately won't work for something like

ORDER BY CASE condition_variable
WHEN 'asc' THEN column_name ASC
ELSE column_name DESC
END;
soulcheck
  • 36,297
  • 6
  • 91
  • 90
  • 1) you need to replace `1` with `a`. 2) This will prevent the optimiser from being able to use indexes, etc. It *does* yield a single query for multiple purposes, but one should this against dynamic sql to check what the performance overhead is. (Only one plan can be created for each single query, but different order by clauses *may* require different plans to be efficient.) – MatBailie Nov 15 '11 at 16:54
  • It's correct, it orders by first column descending or ascending depending on whether a is equal to 'asc' or not. but i'll edit anyway to make it clearer. – soulcheck Nov 15 '11 at 16:59
  • 1
    This is not legal. You can't have ASC or DESC inside the expression. – asnyder Mar 06 '14 at 19:23
  • @asnyder yeah, it was an example of what will not work ;) Very old answer so the quality leaves much to be desired. – soulcheck Mar 06 '14 at 20:19
  • You can do `ORDER BY CASE var WHEN 'asc' THEN col ELSE 0 END ASC, CASE var WHEN 'desc' THEN col ELSE 0 END DESC`. Postgres will (in my experience) optimize this down to a single index scan on the non-constant condition for any particular set of query parameters. – wchargin Jan 31 '23 at 05:54