2

After reading Postgres manual and many posts here, I wrote this function tacking in mind all I found regarding security. It works great and does everything I looked for.

  • takes a json, each key has an array [visible, filter, arg1, optional arg2]

SELECT public.goods__list_json2('{"name": [true, 7, "Ad%"], "category": [true], "stock": [false, 4, 0]}', 20, 0);

  • returns a json array with requested data.

[{"name": "Adventures of TRON", "category": "Atari 2600"}, {"name": "Adventure", "category": "Atari 2600"}]

My question is, how could I really be sure that when I create the query using user input arguments, passing them as %L with format is injection safe? By my db design, all is done through functions, running most of them as security definer only allowing certain roles to execute them.

Being secure, my intention is to convert old functions to this dynamic logic and save myself to write a lot of lines of code creating new or specific queries.

I would really appreciate a experienced Postgres developer could give me an advice on this.

I'm using Postgres 13.

CREATE FUNCTION public.goods__list_json (IN option__j jsonb, IN limit__i integer, IN offset__i integer)
    RETURNS jsonb
    LANGUAGE plpgsql
    VOLATILE 
    STRICT
    SECURITY DEFINER
    COST 1
    AS $$
DECLARE
    table__v varchar := 'public.goods_full';
  column__v varchar[] := ARRAY['id', 'id__category', 'category', 'name', 'barcode', 'price', 'stock', 'sale', 'purchase'];
    filter__v varchar[] := ARRAY['<', '>', '<=', '>=', '=', '<>', 'LIKE', 'NOT LIKE', 'ILIKE', 'NOT ILIKE', 'BETWEEN', 'NOT BETWEEN'];
    select__v varchar[];
    where__v varchar[];
  sql__v varchar;
    key__v varchar;
    format__v varchar;
    temp__v varchar;
    temp__i integer;
    betw__v varchar;
    result__j jsonb;
BEGIN
    FOR key__v IN SELECT jsonb_object_keys(option__j) LOOP
        IF key__v = ANY(column__v) THEN
            IF (option__j->key__v->0)::bool THEN
                select__v := array_append(select__v, key__v);
            END IF;
            temp__i := (option__j->key__v->1)::int;
            IF temp__i > 0 AND temp__i <= array_length(filter__v, 1) THEN
                temp__v := (option__j->key__v->>2)::varchar;
                IF temp__i >= 11 THEN
                    betw__v := (option__j->key__v->>3)::varchar;
                    format__v := format('%I %s %L AND %L', key__v, filter__v[temp__i], temp__v, betw__v);
                ELSE
                    format__v := format('%I %s %L', key__v, filter__v[temp__i], temp__v);
                END IF;
                where__v := array_append(where__v, format__v);
            END IF;
        END IF;
    END LOOP;
    sql__v := 'SELECT jsonb_agg(t) FROM (SELECT '
        || array_to_string(select__v, ', ')
        || format(' FROM %s WHERE ', table__v)
        || array_to_string(where__v, ' AND ')
        || format(' OFFSET %L LIMIT %L', offset__i, limit__i)
        || ') t';
    RAISE NOTICE 'SQL: %', sql__v;
    EXECUTE sql__v INTO result__j;
    RETURN result__j;
END;
$$;
basilean
  • 23
  • 5

2 Answers2

0

A word of warning: this style with dynamic SQL in SECURITY DEFINER functions can be elegant and convenient. But don't overuse it. Do not nest multiple levels of functions this way:

  • The style is much more error prone than plain SQL.
  • The context switch with SECURITY DEFINER has a price tag.
  • Dynamic SQL with EXECUTE cannot save and reuse query plans.
  • No "function inlining".
  • And I'd rather not use it for big queries on big tables at all. The added sophistication can be a performance barrier. Like: parallelism is disabled for query plans this way.

That said, your function looks good, I see no way for SQL injection. format() is proven good to concatenate and quote values and identifiers for dynamic SQL. On the contrary, you might remove some redundancy to make it cheaper.

Function parameters offset__i and limit__i are integer. SQL injection is impossible through integer numbers, there is really no need to quote them (even though SQL allows quoted string constants for LIMIT and OFFSET). So just:

format(' OFFSET %s LIMIT %s', offset__i, limit__i)

Also, after verifying that each key__v is among your legal column names - and while those are all legal, unquoted column names - there is no need to run it through %I. Can just be %s

I'd rather use text instead of varchar. Not a big deal, but text is the "preferred" string type.

Related:

COST 1 seems too low. The manual:

COST execution_cost

A positive number giving the estimated execution cost for the function, in units of cpu_operator_cost. If the function returns a set, this is the cost per returned row. If the cost is not specified, 1 unit is assumed for C-language and internal functions, and 100 units for functions in all other languages. Larger values cause the planner to try to avoid evaluating the function more often than necessary.

Unless you know better, leave COST at its default 100.

Single set-based operation instead of all the looping

The whole looping can be replaced with a single SELECT statement. Should be noticeably faster. Assignments are comparatively expensive in PL/pgSQL. Like this:

CREATE OR REPLACE FUNCTION goods__list_json (_options json, _limit int = NULL, _offset int = NULL, OUT _result jsonb)
    RETURNS jsonb
    LANGUAGE plpgsql SECURITY DEFINER AS
$func$
DECLARE
   _tbl  CONSTANT text   := 'public.goods_full';
   _cols CONSTANT text[] := '{id, id__category, category, name, barcode, price, stock, sale, purchase}';   
   _oper CONSTANT text[] := '{<, >, <=, >=, =, <>, LIKE, "NOT LIKE", ILIKE, "NOT ILIKE", BETWEEN, "NOT BETWEEN"}';
   _sql           text;
BEGIN
   SELECT concat('SELECT jsonb_agg(t) FROM ('
           , 'SELECT ' || string_agg(t.col, ', '  ORDER BY ord) FILTER (WHERE t.arr->>0 = 'true')
                                               -- ORDER BY to preserve order of objects in input
           , ' FROM '  || _tbl
           , ' WHERE ' || string_agg (
                             CASE WHEN (t.arr->>1)::int BETWEEN  1 AND 10 THEN
                                format('%s %s %L'       , t.col, _oper[(arr->>1)::int], t.arr->>2)
                                  WHEN (t.arr->>1)::int BETWEEN 11 AND 12 THEN
                                format('%s %s %L AND %L', t.col, _oper[(arr->>1)::int], t.arr->>2, t.arr->>3)
                               -- ELSE NULL  -- = default - or raise exception for illegal operator index?
                             END
                           , ' AND '  ORDER BY ord) -- ORDER BY only cosmetic
           , ' OFFSET ' || _offset  -- SQLi-safe, no quotes required
           , ' LIMIT '  || _limit   -- SQLi-safe, no quotes required
           , ') t'
          )
   FROM   json_each(_options) WITH ORDINALITY t(col, arr, ord)
   WHERE  t.col = ANY(_cols)        -- only allowed column names - or raise exception for illegal column?
   INTO   _sql;

   IF _sql IS NULL THEN
      RAISE EXCEPTION 'Invalid input resulted in empty SQL string! Input: %', _options;
   END IF;
   
   RAISE NOTICE 'SQL: %', _sql;
   EXECUTE _sql INTO _result;
END
$func$;

db<>fiddle here

Shorter, faster and still safe against SQLi.

Quotes are only added where necessary for syntax or to defend against SQL injection. Burns down to filter values only. Column names and operators are verified against the hard-wired list of allowed options.

Input is json instead of jsonb. Order of objects is preserved in json, so you can determine the sequence of columns in the SELECT list (which is meaningful) and WHERE conditions (which is purely cosmetic). The function observes both now.

Output _result is still jsonb. Using an OUT parameter instead of the variable. That's totally optional, just for convenience. (No explicit RETURN statement required.)

Note the strategic use of concat() to silently ignore NULL and the concatenation operator || so that NULL makes the concatenated string NULL. This way, FROM, WHERE, LIMIT, and OFFSET are only inserted where needed. A SELECT statement works without either of those. An empty SELECT list (also legal, but I suppose unwanted) results in a syntax error. All intended.
Using format() only for WHERE filters, for convenience and to quote values. See:

The function isn't STRICT anymore. _limit and _offset have default value NULL, so only the first parameter _options is required. _limit and _offset can be NULL or omitted, then each is stripped from the statement.

Using text instead of varchar.

Made constant variables actually CONSTANT (mostly for documentation).

Other than that the function does what your original does.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Hello Erwin, Thank you very much for your time and advice, I really appreciate it. Now that I'm feel secure, I will work on it having in mind your comments. I will edit and add the details you commented into original answer for further reference. – basilean May 13 '21 at 05:00
  • 1
    Assignment is slow in PL/pgSQL and inside performance critical section should be reduced, when it is possible. On second hand the speed of assign statements was significantly enhanced in Postgres 13. PL/pgSQL is and will be interpreted language, so any statement (like assignment) has fixed cost, and when it is possible (when code is readable still), then it should be reduced. – Pavel Stehule May 13 '21 at 05:04
  • Hello Pavel, thank you for your time and advise. Knowing that is secure is time to get my hands dirty and begin with improvements. =) – basilean May 13 '21 at 13:36
  • @basilean: Got a little present for you. – Erwin Brandstetter May 13 '21 at 21:01
  • @ErwinBrandstetter WoooW.... thank you very much man! I came back to show improvements and ask new questions and just found updates... gonna study new comments before post. =D – basilean May 14 '21 at 14:29
  • @ErwinBrandstetter I want to really thank you for your time teaching. I spent yesterday trying to figure out how to replace loop, how you use concat() with string_agg() inside going through each element just blew my mind, it opened a whole new way of nesting for me. Besides all clarifications on right places at post and comments, the elegance in the code is a lesson by itself. By the end, the introduction of fiddle. I just need to add "In" and "not in" in the operators using this new technic and it would be completed . Again, thank you very much for share your knowledge. – basilean May 14 '21 at 16:01
  • @I think I got it =): WHEN (t.arr->>1)::int BETWEEN 13 AND 14 THEN format('%s %s (%s)', t.col, _oper[(arr->>1)::int], ( SELECT string_agg(format('%L', j), ',') FROM json_array_elements_text(t.arr->2) j)) – basilean May 14 '21 at 17:21
  • Since your are getting all excited about this, I added a word of warning to curb your enthusiasm. ;) – Erwin Brandstetter May 15 '21 at 01:16
  • @ErwinBrandstetter - Yeah... I feel like a made a level up!... No problem... I love Larry =D – basilean May 15 '21 at 15:13
  • @ErwinBrandstetter my brain hurts. in dbfiddle link, why `SELECT goods__list_json('{"name": [true, 7, "Ad%"] , "category": [true] , "stock": [false, 4, 0]}' , 20, 0);` will invoke 'AND', since 7 should be in` format('%s %s %L' ` – jian Feb 18 '22 at 10:42
  • @Mark: Sorry, I don't understand the question. – Erwin Brandstetter Feb 18 '22 at 23:09
0

I tried to put all that I learned here and I came up with this below and new questions =D.

  • Is there any advantage declaring _oper this way '{LIKE, "NOT LIKE"}' instead of ARRAY['LIKE', 'NOT LIKE']?
  • Casting as int _limit and _offset, I'm assuming no SQLi, right?
  • Is it an elegant way for 'IN' and 'NOT IN' CASE? I wonder why string_agg() is allowed nested in concat() but not there where I needed to use a sub query.

This is a naive private function.

Edit: Removed "SECURITY DEFINER" as identified dangerous.

CREATE FUNCTION public.crud__select (IN _tbl text, IN _cols text[], IN _opts json, OUT _data jsonb)
    LANGUAGE plpgsql STRICT AS
$$
DECLARE
    _oper CONSTANT text[] := '{<, >, <=, >=, =, <>, LIKE, "NOT LIKE", ILIKE, "NOT ILIKE", BETWEEN, "NOT BETWEEN", IN, "NOT IN"}';
BEGIN
    EXECUTE (
        SELECT concat('SELECT jsonb_agg(t) FROM ('
                , 'SELECT ' || string_agg(e.col, ', ' ORDER BY ord) FILTER (WHERE e.arr->>0 = 'true')
                , ' FROM ', _tbl
                , ' WHERE ' || string_agg(
                    CASE
                        WHEN (e.arr->>1)::int BETWEEN  1 AND 10 THEN
                            format('%s %s %L', e.col, _oper[(e.arr->>1)::int], e.arr->>2)
                        WHEN (e.arr->>1)::int BETWEEN 11 AND 12 THEN
                            format('%s %s %L AND %L', e.col, _oper[(e.arr->>1)::int], e.arr->>2, e.arr->>3)
                        WHEN (e.arr->>1)::int BETWEEN 13 AND 14 THEN
                            format('%s %s (%s)', e.col, _oper[(e.arr->>1)::int], (
                                SELECT string_agg(format('%L', ee), ',') FROM json_array_elements_text(e.arr->2) ee)
                            )
          END, ' AND ')
                , ' OFFSET ' || (_opts->>'_offset')::int
                , ' LIMIT '  || (_opts->>'_limit')::int
                , ') t'
        )
    FROM json_each(_opts) WITH ORDINALITY e(col, arr, ord)
        WHERE e.col = ANY(_cols)
    ) INTO _data;
END;
$$;

Then for table or view, I create wrapper function executable for some roles.

CREATE FUNCTION public.goods__select (IN _opts json, OUT _data jsonb)
    LANGUAGE sql STRICT SECURITY DEFINER AS
$$
    SELECT public.crud__select(
        'public.goods_full',
        ARRAY['id', 'id__category', 'category', 'name', 'barcode', 'price', 'stock', 'sale', 'purchase'],
        _opts
    );
$$;

SELECT public.goods__select('{"_limit": 10, "name": [true, 9, "a%"], "id__category": [true, 13, [1, 2]], "category": [true]}'::json);

[{"name": "Atlantis II", "category": "Atari 2600", "id__category": 1}, .. , {"name": "Amidar", "category": "Atari 2600", "id__category": 1}]

basilean
  • 23
  • 5
  • Now your inner function `public.crud__select()` is **unsafe**! Remove `SECURITY DEFINER` from it or anyone can pass evil things to be executed by the definer of the function. It's enough that the outer function `public.goods__select ()` is `SECURITY DEFINER` and created by a privileged role like the owner of the respective table. (Ideally **not a superuser** to contain the scope of possible errors like you just displayed!) Any called functions (which are not `SECURITY DEFINER` themselves) inherit the `current_user` from the outer function call. – Erwin Brandstetter May 15 '21 at 00:56
  • Also, I would reserve the naming convention with leading underscore to plpgsql variables (and parameters), not JSON keys which do not have to be valid SQL identifiers. So `_opts->>'offset'` and `_opts->>'limit'`. And since 'limit' and 'offset' are nested as JSON keys now, move the column list to a separate nested object (else you process additional keys needlessly): `SELECT public.goods__select('{"limit": 10, "columns": {"name": [true, 9, "a%"], "id__category": [true, 13, [1, 2]], "category": [true]}}'::json);` Use `... FROM json_each(_opts->'columns') ...` in the function accordingly. – Erwin Brandstetter May 15 '21 at 00:57
  • For any more questions (including the ones added above): please ask them as *questions*. Comments are not the place. – Erwin Brandstetter May 15 '21 at 00:57
  • @ErwinBrandstetter - Once again, thank you very much for your comments. I will keep last advice in mind from now on for next questions. =) – basilean May 15 '21 at 14:45
  • A small clarification, _cols will never have offset or limit (and maybe in a future order) so my understanding is that they got filter at ANY(_cols). crud_select is denied for every one and just get called by the owner that is same as good_select, but I get your point and you right, thank you! =) – basilean May 15 '21 at 15:24