1

This question comes as a follow up to and a result of SQL function very slow compared to query without function wrapper. I should note that I don't consider this a duplicate, since that question was asking for a solution to a specific problem. I am asking for more information about the behavior in general here, and demonstrating how it can be reproduced. (To demonstrate the difference, you can see a fairly long comment thread on the accepted answer where we discussed the behavior, and I felt it was getting off topic, especially given the length.)

I have a function. Here's a sample one that exhibits the behavior of interest:

CREATE OR REPLACE FUNCTION test(INT)
  RETURNS TABLE(num INT, letter TEXT)
  VOLATILE
  LANGUAGE SQL
  AS $$
  SELECT *
  FROM (VALUES (1,'a'),(2,'b'),(3,'c'),(4,'d'),(5,'e')) x
  LIMIT $1
  $$;

When I run this EXPLAIN:

EXPLAIN ANALYZE SELECT * FROM test(10);

I get this result in psql (where I've removed a giant "Query Plan" header):

 Function Scan on test  (cost=0.25..10.25 rows=1000 width=36) (actual time=0.125..0.136 rows=5 loops=1)
 Total runtime: 0.179 ms
(2 rows)

Take note of the row estimation. It estimates 1000 rows.

But, if I change the function to STABLE or IMMUTABLE:

CREATE OR REPLACE FUNCTION test(INT)
  RETURNS TABLE(num INT, letter TEXT)
  STABLE
  LANGUAGE SQL
  AS $$
  SELECT *
  FROM (VALUES (1,'a'),(2,'b'),(3,'c'),(4,'d'),(5,'e')) x
  LIMIT $1
  $$;

Then the same EXPLAIN gives me a different plan:

 Limit  (cost=0.00..0.06 rows=5 width=36) (actual time=0.010..0.050 rows=5 loops=1)
   ->  Values Scan on "*VALUES*"  (cost=0.00..0.06 rows=5 width=36) (actual time=0.005..0.018 rows=5 loops=1)
 Total runtime: 0.087 ms
(3 rows)

Now it correctly estimates 5 rows, and it shows a plan for the query contained inside the function. The cost is an order of magnitude higher. The runtime went down as well. (The query is so short, that might not be especially significant.)

In light of the linked question dealing with much more data and having a very significant performance difference, it would seem that the planner is actually doing something different based on whether the function is VOLATILE or STABLE/IMMUTABLE.

What exactly is the planner doing here, and where can I read some documentation on it?

These tests were run in PG 9.3.

Community
  • 1
  • 1
jpmc26
  • 28,463
  • 14
  • 94
  • 146

1 Answers1

2

It estimates 1000 rows

1000 estimated rows is a default value documented in CREATE FUNCTION:

execution_cost

A positive number giving the estimated execution cost for the function, in units of cpu_operator_cost. If the function returns a set, this is the cost per returned row. If the cost is not specified, 1 unit is assumed for C-language and internal functions, and 100 units for functions in all other languages. Larger values cause the planner to try to avoid evaluating the function more often than necessary.

result_rows

A positive number giving the estimated number of rows that the planner should expect the function to return. This is only allowed when the function is declared to return a set. The default assumption is 1000 rows.

When a function is declared volatile, it's asking to not be inlined, so this default value for result_rows holds.

On the other hand, when it's getting inlined in a query like in your second test, the number of rows is going to be estimated as if the body of the function had been moved into the query and the function declaration didn't exist. This leads in the second test to an exact estimate since the VALUES clause can be directly evaluated.

What exactly is the planner doing here, and where can I read some documentation on it?

In general, the planner's optimization strategies are not explained in the main documentation. They're discussed in mailing lists and mentioned in the source code comments, which fortunately tend to be exceptionally clear and well written (compared to average source code). In the case of function inlining, I believe the comments of inline_set_returning_functions and inline_set_returning_function reveal most of the rules driving this particular optimization. (warning: above links are into the current master branch which is subject to change or drift anytime).

Daniel Vérité
  • 58,074
  • 15
  • 129
  • 156
  • Can you point to any documentation about inlining functions? I haven't been able to hunt any down. – jpmc26 Jan 31 '15 at 01:42