Multirow subselect as parameter to `execute using`

Question

The multirow subselect will be used in the right hand side of the in operator in the where clause:

create table t (a integer);
insert into t (a) values (1), (9);

drop function if exists f();

create function f()
returns void as $$
begin
execute '
    select a
    from t
    where a in $1
' using (select 1 union select 2);
end;$$
language plpgsql;

select f();

ERROR:  more than one row returned by a subquery used as an expression
CONTEXT:  SQL statement "SELECT (select 1 union select 2)"
PL/pgSQL function "f" line 3 at EXECUTE statement

How to achieve what the above function would if it worked?

score 3 · Accepted Answer · edited Jun 20 '20 at 09:12

3

I see nothing in your question that couldn't more easily be solved with:

SELECT a
FROM   t
JOIN  (VALUES (1), (2)) x(a) USING (a); -- any query returning multiple int

Can you clarify the necessity in your example?

As a proof of concept, this would be simpler / faster:

CREATE OR REPLACE FUNCTION x.f1()
  RETURNS SETOF integer AS
$BODY$
BEGIN
RETURN QUERY EXECUTE '
    SELECT a
    FROM t
    WHERE a = ANY($1)'
USING ARRAY(VALUES (1), (2)); -- any query here
END;
$BODY$
LANGUAGE plpgsql;

Performance of IN () and = ANY()

Your observation is spot on. And there is reason for it. Try:

EXPLAIN ANALYZE SELECT * FROM tbl WHERE id IN (1,2,3);

The query plan will reveal:

Index Cond: (id = ANY ('{1,2,3}'::integer[]))

PostgreSQL transforms the id IN (..) construct into id = ANY(..) internally. These two perform identically - except for a negligible overhead penalty.

My code is only faster in building the statement - exactly as you diagnosed.

edited Jun 20 '20 at 09:12

Community

1
1

answered Mar 21 '12 at 18:40

Erwin Brandstetter

605,456
145
1,078
1,228

Yes simpler and probably faster at building the query (I'm accepting it) but the execution plan cost is exactly the same as my own answer (`=any` vs `in`). The necessity is that the table is partitioned and if the `where` parameters are not constants the planner will not optimize the query. – Clodoaldo Neto Mar 21 '12 at 21:29
@Clodoaldo: I added a bit concerning your observation about performance. – Erwin Brandstetter Mar 21 '12 at 21:45
@Clodoaldo: You could take something like my simple query and run it with `EXECUTE`. Forces re-planning in any case. A `JOIN` will be faster than an `IN ()` expression, especially with longer lists of values. – Erwin Brandstetter Mar 21 '12 at 21:56
No I mean what I wrote. Check [one of the last paragraphs](http://www.postgresql.org/docs/9.1/static/ddl-partitioning.html#DDL-PARTITIONING-CAVEATS) _http://www.postgresql.org/docs/9.1/static/ddl-partitioning.html#DDL-PARTITIONING-CAVEATS_ I have experienced sluggish queries turn into blazing fast ones when the `where` or `join` parameters were turned into constants. Oh you just deleted your comment. I guess you checked the manual or something. No re-planning will avoid this planner behavior when the condition involves a partitioned table. – Clodoaldo Neto Mar 21 '12 at 22:05
@Clodoaldo: Yeah, I removed the first sentence, right after I had typed it because I had it backwards. – Erwin Brandstetter Mar 21 '12 at 22:48

score 0 · Answer 2 · answered Mar 21 '12 at 17:47

0

create function f()
returns setof integer as $$
begin
return query execute format('
    select a
    from t
    where a in %s
', replace(replace(array(select 1 union select 2)::text, '{', '('), '}', ')'));
end$$
language plpgsql;

answered Mar 21 '12 at 17:47

Clodoaldo Neto

118,695
26
233
260

Multirow subselect as parameter to `execute using`

2 Answers2

Performance of IN () and = ANY()

Linked