Capture number of rows affected by dynamic sql?

Question

I am trying to get the return from a QUERY EXEUTE in a plpgsql function to be able to check how many rows were affected from a dynamic update query. My use case is adding an event (with a custom payload) to a separate table on insert or update to a dynamically set table. Because my event has a custom payload, I have not been able to use a database trigger (e.g. trigger before insert). As a simplified example, assume I have this table:

CREATE TABLE users (user_id text primary key, name text)

Here is my simplified events table:

CREATE TABLE events(event_id text primary key, payload json)

Here is my simplified function:

CREATE OR REPLACE FUNCTION my_function(_rowtype anyelement, q text, payload jsonb)
    RETURNS SETOF anyelement AS
$func$
DECLARE
    event_id text;
BEGIN
    SELECT jsonb_object_field_text (payload, 'id')::text INTO STRICT event_id;
    execute format('insert into event(event_id, payload) values ($1, $2)') using event_id, payload;
    RETURN QUERY EXECUTE format('%s', q);
END
$func$ LANGUAGE plpgsql;

The goal is to have this work exactly the same as if someone had created these in a transaction. In pseucode for insert:

BEGIN
insert into events(id, payload) values($1, $2)
insert into users(columns) values(<any values>)
COMMIT

and similarly for update:

BEGIN
insert into events(id, payload) values($1, $2)
result, error := query(`update users set name = 'hello' where id = 'Not Exists Thus No Rows Modified'`);
if result.rowsAffected() == 0 {
   ROLLBACK
}
COMMIT

The function my_function almost works except for one edge case: when an update actually doesn't affect any rows. For example, this works:

select * from my_function(NULL::users, 
       'insert into users(id,name) values('u1', ''a2'') returning *',
       payload => '{"id": "e1", "custom": "s1", "field": "2019-10-12T07:20:50.52Z"}')

As expected, after this is done both a row in the users table and the events table is created. What fails is the following:

select * from my_function(NULL::users, 
       'update users set name = ''hello'' where user_id = ''NotExists'' returning *',
       payload => '{"id": "e2", "custom": "s3", "field": "2019-10-12T07:20:50.52Z"}')

Here, a row is created in the events table (my goal is that it should not be created). I know this approach is not elegant, and I know this is vulnerable to SQL injection. I'd love suggestions on better ways to solve this (including scrapping what we're doing now). But to answer the question directly, I'm looking to store the result of QUERY EXECUTE, check if any rows were affected, and raise an error so that there is never a case where a row in the events table is created when there is not real corresponding change in the users table. Users table is just an example, in general, it could be any dynamically set table.

https://www.postgresql.org/docs/current/plpgsql-statements.html#PLPGSQL-STATEMENTS-DIAGNOSTICS — , Jul 02 '21 at 07:55
Thank you very much a_horse_with_no_name! I'm very appreciative of your comment. I see there that I can probably use `FOUND` However, I can't figure out how to store the result because the type returned is dynamic. As in, I can't do something like this: ``` EXECUTE format('%s', q) into row_with_dynamic_type; IF NOT FOUND THEN RAISE EXCEPTION 'failed' END IF; RETURN row_with_dynamic_type ``` Because execute may return one or more columns depending on the dynamic query. Do you have any suggestions? what can I set as the type of `row_with_dynamic_type` so that I can still return? — A H, Jul 02 '21 at 08:12
No, not the `FOUND` variable but the `GET DIAGNOSTICS integer_var = ROW_COUNT;` construct will help solve your issue by inspecting the value of `integer_var`. — Stefanov.sm, Jul 02 '21 at 08:25
Thank you Stevanov.sm! I still have the same issue though, which is how to store the output of `QUERY EXECUTE` into a variable that I don't know the type (to be able to use `GET DIAGNOSTICS`. How can I have `QUERY EXECUTE format('%s', q) into ` to then follow that with `GET DIAGNOSTICS integer_var = ROW_COUNT;` and then finally `RETURN `? — A H, Jul 02 '21 at 21:38
`jsonb_object_field_text()` is not a Postgres function. Can you disclose its definition? Where does the ominous `q` come from? I am asking because this is a gateway for SQL injection if I have ever seen one. And please always disclose your version of Postgres. — Erwin Brandstetter, Jul 03 '21 at 21:27
Thanks Erwin. As mentioned in the post, I am aware that the formulated way above is extremely vulnerable to sql injection. The query does not come from an end-user (but from the application itself). The final form is using: ` CREATE OR REPLACE FUNCTION my_function( _rowtype any, q text, payload jsonb, n int, a1 text, a2 text, a3 text) RETURNS SETOF any AS ... if n == 3: return query execute format('%s', q) using a1, a2, a3; ... ``` I believe this parameterization should guard against sql injection. Since the args are text, the caller casts the correct type in `q` itself — A H, Jul 03 '21 at 21:54
`I believe this parameterization should guard against sql injection.` When done properly for the right use case, yes. In your case, no. `q` is passed as text and executed as code. — Erwin Brandstetter, Jul 03 '21 at 22:32
@Stefanov.sm: There may be a misunderstanding. See my answer. — Erwin Brandstetter, Jul 03 '21 at 22:34
Thanks Erwin, I understand the concern. As I said, q is guaranteed to be parametrized. It is always called by the same application for a small set of static queries and always itself is parametrized: `insert into audit_history(id, name) values($1, $2)`, `insert into another_table(id, created, mode) values ($1, $2::timestamptz, $3::smallint)` etc. There are about 10 static queries that q accepts and they are always the same. I understand the concern and know the risks. I appreciate the response. — A H, Jul 04 '21 at 00:37
OK, you understand the risks. If it's just a handful of static queries, maybe store them inside the database safely, and only pass a query token? Either in the function directly, switched with a `CASE` construct, or in a small table (with privileged access). Safer, all the SQL code and possible dependencies in one place, and simpler function call, too. — Erwin Brandstetter, Jul 04 '21 at 01:23

score 1 · Answer 1 · answered Jul 03 '21 at 05:37

A RETURN QUERY doesn't need to go to the end of the function, it only says: "the results of this query are part of the resulting set".

So you can use the RETURN QUERY, ask for FOUND and act accordingly. Here is your function modified for working this way:

CREATE OR REPLACE FUNCTION public.my_function(_rowtype anyelement, q text, payload jsonb)
 RETURNS SETOF anyelement
 LANGUAGE plpgsql
AS $function$
DECLARE
    event_id text;
BEGIN
    SELECT jsonb_object_field_text (payload, 'id')::text INTO STRICT event_id;

    RETURN QUERY EXECUTE format('%s', q);
    IF FOUND THEN
        execute format('insert into events(event_id, payload) values ($1, $2)') using event_id, payload;
    END IF;

    RETURN;
END
$function$

PD: Maybe you can also solve your problem with triggers FOR EACH STATEMENT using the transition tables OLD and NEW (which are available since v10, https://www.postgresql.org/docs/10/sql-createtrigger.html)

Erwin Brandstetter · Accepted Answer · 2021-07-04T01:19:22.673

CREATE OR REPLACE FUNCTION my_function(_rowtype anyelement, q text, payload jsonb)
  RETURNS SETOF anyelement
  LANGUAGE plpgsql AS
$func$
BEGIN
   RETURN QUERY EXECUTE q;

   IF NOT FOUND THEN 
      RETURN;  -- nothing happened yet, we can exit silently.
      -- Or you WANT an error for this case. Then do this instead:
      -- RAISE EXCEPTION 'Query passed in parameter "q" did not affect any rows. Doing nothing!';
   END IF;
   
   INSERT INTO event(event_id, payload)
   VALUES (payload->>'id', payload);
END
$func$;

As has been commented, RETURN QUERY does not return from the function. The manual:

RETURN NEXT and RETURN QUERY do not actually return from the function — they simply append zero or more rows to the function's result set. Execution then continues with the next statement in the PL/pgSQL function. As successive RETURN NEXT or RETURN QUERY commands are executed, the result set is built up. A final RETURN, which should have no argument, causes control to exit the function (or you can just let control reach the end of the function).

There's a code example for your case exactly at the bottom of that chapter in the manual. From me, actually. Originating here:

FUNCTION syntax error

It was suggested to use GET DIAGNOSTICS instead of the simpler FOUND. It's true that EXECUTE does not set the state of FOUND. But RETURN QUERY does. So keep using the simpler FOUND. Related:

Dynamic SQL (EXECUTE) as condition for IF statement

You have format() in your original twice. And while that's typically very useful for dynamic SQL, it's useless in your case. EXECUTE format('%s', q) is exactly the same as just EXECUTE q, with added cost. Both are open doors for SQL injection when passing user input.

While there is a good chance that the transaction might be rolled back, start with the critical step, and do the rest later. Avoid wasting the work. So I moved executing q to the top. Assuming it does not depend on the "payload" row, now inserted later.

Also, INSERT INTO events can be plain SQL. Nothing dynamic there. No need for format() or EXECUTE.

Finally, assuming your jsonb_object_field_text (payload, 'id')::text is just a fancy way of saying payload->>'id'. No need for an additional variable and another SELECT INTO.

Warning against SQL injection

Converting user input (parameter q in the example) to code to execute dynamically is the most direct SQL injection vulnerability of all. I wouldn't want to be caught in my underwear doing that.

Capture number of rows affected by dynamic sql?

2 Answers2

Warning against SQL injection