1

I have the following function which I'd like to use in a sql query (Postgres 9.3):

SELECT * FROM test_table tt WHERE has_access(tt.id, tt.login)

CREATE OR REPLACE FUNCTION has_access(integer, integer)
RETURNS boolean AS
$BODY$
SELECT
  EXISTS (SELECT true
            FROM test_read_access
          WHERE id = $1 and login = $2 
  )
  AND
  NOT EXISTS (SELECT true
            FROM test_no_read_access
          WHERE id = $1 and login = $2 
  )
$BODY$

This works fine as long as I have to concern only about functional correctness. Because the query analyzer tells me, that the function has to be evaluated for every row and therefore the EXISTS-clauses cannot be optimized as expected. Indeed the query is very slow compared to the following query (inlining the EXISTS-clauses without the SELECT-clause):

SELECT * FROM test_table tt WHERE 
   EXISTS (SELECT true
            FROM test_read_access
          WHERE id = tt.id and login = tt.login 
  )
  AND
  NOT EXISTS (SELECT true
            FROM test_no_read_access
          WHERE id = tt.id and login = tt.login 
  )

The intention of the function has_access(id, login) is to group some access rule in a function and to use it then in different queries. I mean, it is possible to do something like this to get a good performance:

SELECT * FROM test_table tt WHERE EXISTS (select has_access(tt.id, tt.login))

CREATE OR REPLACE FUNCTION has_access(integer, integer)
RETURNS SETOF boolean AS
$BODY$
SELECT true
   FROM test_read_access
WHERE id = $1 and login = $2 
$BODY$

But now I only have one sub query on one table in the function and this is not useful in my case. Any suggestion on how to do this properly in order to not run into performance issues?

Thanks!

reinoslav
  • 11
  • 1
  • 3
  • EXISTS tends to result in correlated subqueries, which are harder for the DBMS to optimize; using IN/NOT IN tends to be more efficient. – okaram Nov 04 '14 at 00:39
  • @okaram: completely wrong. Please note, that in this case there is no correlated subquery, since there is no main query. – wildplasser Nov 04 '14 at 00:52
  • @wildplasser, the query in the function is not correlated, but the one above is (but I had not understood that the problem was the function when I made the comment :) – okaram Nov 04 '14 at 01:43

1 Answers1

0

OK, I think I see what your problem is; the function calls are not optimizable, so you need to do the query outside the function; something like

SELECT *
  FROM test_table
WHERE (id,login) IN (SELECT id,login FROM test_read_access)
  AND (id,login) NOT IN (SELECT id,login FROM test_no_read_access)

Check http://sqlfiddle.com/#!12/94a02/2

okaram
  • 1,444
  • 13
  • 11
  • 1
    There is no reason to prefer `NOT IN(...)` here ; `NOT EXISTS(...)` works just as well, or maybe even better. The performance problem in the OQ is in the function calling, which cannot be broken up by the optimiser. – wildplasser Nov 04 '14 at 00:56
  • @wildplasser I now understand the problem is the function call; however, usually in/not in queries are easier to optimize, right ? EXISTS tend to do correlated subqueries (need to refer to the outside row inside the query), which are harder for the optimizer – okaram Nov 04 '14 at 01:41
  • 1
    In almost all my numerous tests in Postgres over past years, `NOT EXISTS` turned out to be faster than `NOT IN`. In addition, `NOT IN` exhibits tricky behaviour with NULL values. [More here.](http://stackoverflow.com/a/19364694/939860). There may changes in the upcoming pg 9.4. Not saying there are, not expecting either. Just didn't test, yet. – Erwin Brandstetter Nov 04 '14 at 22:17