1

I have a plpgsql function that takes a jsonb input, and uses it to first check something, and then again in a query to get results. Something like:

CREATE OR REPLACE FUNCTION public.my_func(
    a jsonb,
    OUT inserted integer)
    RETURNS integer
    LANGUAGE 'plpgsql'
    COST 100.0
    VOLATILE NOT LEAKPROOF
AS $function$
BEGIN
    -- fail if there's something already there 
    IF EXISTS(
    select t.x from jsonb_populate_recordset(null::my_type, a) f inner join some_table t
    on f.x = t.x and
       f.y = t.y
    ) THEN
    RAISE EXCEPTION 'concurrency violation... already present.';
    END IF;

    -- straight insert, and collect number of inserted
    WITH inserted_rows AS (
        INSERT INTO some_table (x, y, z)
        SELECT f.x, f.y, f.z
        FROM jsonb_populate_recordset(null::my_type, a) f
        RETURNING 1
    )
    SELECT count(*) from inserted_rows INTO inserted
    ;
END

Here, I'm using jsonb_populate_recordset(null::my_type, a) both in the IF check, and also in the actual insert. Is there a way to do the parsing once - perhaps via a variable of some sort? Or would the query optimiser kick in and ensure the parse operation happens only once?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
ashic
  • 6,367
  • 5
  • 33
  • 54

3 Answers3

0

If I understand correctly you look to something like this:

CREATE OR REPLACE FUNCTION public.my_func(
    a jsonb,
    OUT inserted integer)
    RETURNS integer
    LANGUAGE 'plpgsql'
    COST 100.0
    VOLATILE NOT LEAKPROOF
AS $function$
BEGIN
    WITH checked_rows AS (
        SELECT f.x, f.y, f.z, t.x IS NOT NULL as present
        FROM jsonb_populate_recordset(null::my_type, a) f
        LEFT join some_table t
            on f.x = t.x and f.y = t.y
    ), vioalted_rows AS (
        SELECT count(*) AS violated FROM checked_rows AS c WHERE c.present
    ), inserted_rows AS (
        INSERT INTO some_table (x, y, z)
        SELECT c.x, c.y, c.z
        FROM checked_rows AS c
        WHERE (SELECT violated FROM vioalted_rows) = 0
        RETURNING 1
    )
    SELECT count(*) from inserted_rows INTO inserted
    ;

    IF inserted = 0 THEN 
        RAISE EXCEPTION 'concurrency violation... already present.';
    END IF;

END;
$function$;
Roman Tkachuk
  • 3,096
  • 1
  • 16
  • 15
0

JSONB type is no need to parse more then once, at the assignment:

while jsonb data is stored in a decomposed binary format that makes it slightly slower to input due to added conversion overhead, but significantly faster to process, since no reparsing is needed.

Link

jsonb_populate_recordset function declared as STABLE:

STABLE indicates that the function cannot modify the database, and that within a single table scan it will consistently return the same result for the same argument values, but that its result could change across SQL statements.

Link

I am not sure about it. From the one side UDF call is considering as single statements, from the other side UDF can contains multiple statement. Clarification needed.

Finally if you want to cache such sings then you could to use arrays:

CREATE OR REPLACE FUNCTION public.my_func(
    a jsonb,
    OUT inserted integer)
    RETURNS integer
    LANGUAGE 'plpgsql'
    COST 100.0
    VOLATILE NOT LEAKPROOF
AS $function$
DECLARE
    d my_type[]; -- There is variable for caching 
BEGIN
    select array_agg(f) into d from jsonb_populate_recordset(null::my_type, a) as f;
    -- fail if there's something already there 
    IF EXISTS(
      select *
      from some_table t
      where (t.x, t.y) in (select x, y from unnest(d)))
    THEN
      RAISE EXCEPTION 'concurrency violation... already present.';
    END IF;

    -- straight insert, and collect number of inserted
    WITH inserted_rows AS (
        INSERT INTO some_table (x, y, z)
        SELECT f.x, f.y, f.z
        FROM unnest(d) f
        RETURNING 1
    )
    SELECT count(*) from inserted_rows INTO inserted;
END $function$;
Abelisto
  • 14,826
  • 2
  • 33
  • 41
0

If you actually want to reuse a result set repeatedly, the general solution would be a temporary table. Example:

However, that's rather expensive. Looks like all you need is a UNIQUE constraint or index:

Simple and safe with UNIQUE constraint

ALTER TABLE some_table ADD CONSTRAINT some_table_x_y_uni UNIQUE (x,y);

As opposed to your procedural attempt, this is also concurrency-safe (no race conditions). Much faster, too.

Then the function can be dead simple:

CREATE OR REPLACE FUNCTION public.my_func(a jsonb, OUT inserted integer) AS
$func$
BEGIN
   INSERT INTO some_table (x, y, z)
   SELECT f.x, f.y, f.z
   FROM   jsonb_populate_recordset(null::my_type, a) f;

   GET DIAGNOSTICS inserted = ROW_COUNT;  -- OUT param, we're done here
END
$func$  LANGUAGE plpgsql;

If any (x,y) is already present in some_table you get your exception. Chose an instructive name for the constraint, which is reported in the error message.

And we can just read the command tag with GET DIAGNOSTICS, which is substantially cheaper than running another count query.

Related:

UNIQUE constraint not possible?

For the unlikely case that a UNIQUE constraint should not be feasible, you can still have it rather simple:

CREATE OR REPLACE FUNCTION public.my_func(a jsonb, OUT inserted integer) AS
$func$
BEGIN
   INSERT INTO some_table (x, y, z)
   SELECT f.x, f.y, f.z  -- empty result set if there are any violations
   FROM  (
      SELECT f.x, f.y, f.z, count(t.x) OVER () AS conflicts
      FROM   jsonb_populate_recordset(null::my_type, a) f
      LEFT   JOIN some_table t USING (x,y)
      ) f
   WHERE  f.conflicts = 0;

   GET DIAGNOSTICS inserted = ROW_COUNT;

   IF inserted = 0 THEN
      RAISE EXCEPTION 'concurrency violation... already present.';
   END IF;

END
$func$  LANGUAGE plpgsql;

Count the number of violations in the same query. (count() only counts non-null values). Related:

You should have at least a simple index on some_table (x,y) anyway.

It's important to know that plpgsql does not return results before control exits the function. The exception cancels the return, the user never gets results, only the error message. We added a code example to the manual.

Note, however, that there are race conditions here under concurrent write load. Related:

Would the query planner avoid repeated evaluation?

Certainly not between multiple SQL statements.

Even if the function itself is defined STABLE or IMMUTABLE (jsonb_populate_recordset() in the example is STABLE), the query planner does not know that values of input parameters are unchanged between calls. It would be expensive to keep track and make sure of it.
Actually, since plpgsql treats SQL statements like prepared statements, that's plain impossible, since the query is planned before parameter values are fed to the planned query.

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • The example is simplified. The actual use case is not achievable with a unique constraint (it's an insert only model with a new row for each successful update). – ashic Mar 03 '17 at 13:12
  • @ashic: `it's an insert only model with a new row for each successful update`. OK, so what's not applicable then? You desperately *need* an index on `(x,y)` anyway for performance. – Erwin Brandstetter Mar 03 '17 at 13:15
  • There is already an index on the necessary columns. It's just not unique. The question is more about the parsing (or re-parsing) of the parameter, and whether there's a way to reduce wasted calls (if the optimiser doesn't already do it). – ashic Mar 03 '17 at 15:55
  • @ashic: I added a general and a specific alternative for that. – Erwin Brandstetter Mar 03 '17 at 16:05
  • Thanks for the clarification. I'm trying out the various options later today to see which works. It definitely helps to know that the query planner won't do it automatically. – ashic Mar 03 '17 at 16:26
  • @ashic: So do you have your answer? – Erwin Brandstetter Mar 06 '17 at 15:01