PL/pgSQL - Find out which items of an array exist in the table

Question

I am trying to find a way to split a text array into two arrays by their existence in a table.

Let's say that the users table has these three IDs: 1, 2, 3, and I have an input_array_ids with values 1, 2, 5. IDs 1, 2 exist in the table so it goes into another array called ids_valid, and ID 5 goes into ids_invalid. By the end, I want to run an update on valid IDs and notify the users which IDs have been updated and which IDs couldn't be found in the table.

CREATE FUNCTION update_status(p_id character varying DEFAULT NULL::character varying,
                              p_status character varying DEFAULT NULL::character varying) RETURNS character varying
    LANGUAGE plpgsql
AS
$$
DECLARE

    v_valid_ids   users.id%type;
    v_invalid_ids users.id%type;
    v_result      varchar(255);
    v_ids         text[];
BEGIN
    v_ids = string_to_array(regexp_replace(p_id, '[\s+]', '', 'g'), ',');

    --Do some magic to split v_ids into v_valid_ids and v_invalid_ids

    UPDATE users
    SET row_status = p_status,
        updated_by = 'hxBiSP'
    WHERE id = ANY (v_valid_ids);

    v_result = 'Updated: ' || string_agg(v_valid_ids::character varying, ', ') || 'Could not update: ' || string_agg(v_invalid_ids::character varying, ', ');
    RETURN v_result;

END ;
$$;

ALTER FUNCTION update_status(varchar, varchar) OWNER TO postgres;

How can I do it?

You show an example with `integer` arrays, but the function works with `varchar` input, a `text` array, and obfuscated types. The actual data type of `id` matters. (As does your version of Postgres, always.) And tell us more about the input: A text string? What specs exactly? — Erwin Brandstetter, Sep 27 '20 at 23:07

score 1 · Answer 1 · answered Sep 26 '20 at 22:53

I don't think that you need plgpsql to do what you want: CTEs are good enough. Consider the following sql function:

create function update_status(p_id text, p_status text) 
returns text
as $$
    with 
        t_ids(id) as (select regexp_split_to_table('1, 2, 4', ',\s*')::int),
        upd as (
            update users u
            set row_status = p_status, updated_by = 'hxbisp'
            from t_ids i 
            where i.id = u.id
        )
    select 'updated: ' || string_agg(u.id::text, ', ') 
        || ' - could not update: ' || string_agg(i.id::text, ', ') filter(where u.id is null)
    from t_ids i
    left join users u on u.id = i.id        
$$ 
language sql;

The first CTE, t_ids, unnest the input string to rows. Then, upd performs with a join. Finally, the outer query generates the status string that represent the matched and unmatched ids, using a left join and aggregation - that is the resultset of the function.

Demo on DB Fiddle:

-- initial table content
select * from users order by id;

id | name | row_status | updated_by
-: | :--- | :--------- | :---------
 1 | foo  |            |                    
 2 | bar  |            |                    
 3 | baz  |            |                    


-- call the function
select update_status('1, 2, 4', 'zoo');

| update_status                       |
| :---------------------------------- |
| updated: 1, 2 - could not update: 4 |

-- check the content of the table
select * from users order by id;

id | name | row_status | updated_by
-: | :--- | :--------- | :---------
 1 | foo  | zoo        | hxbisp    
 2 | bar  | zoo        | hxbisp    
 3 | baz  |            |

Erwin Brandstetter · Answer 2 · 2020-09-28T13:04:41.807

Dividing IDs into two sets of existing and not existing (v_valid_ids and v_invalid_ids in your code) beforehand, opens you up to race conditions. What if IDs are deleted / updated / inserted by concurrent transactions?

Just run the UPDATE on the original set (list / array) of IDs and use the RETURNING clause to return IDs that were actually affected. See:

Not only is typically much cheaper (only a single pass over the target table), it is also safe against race conditions.

Assuming integer IDs, which allows simpler and faster code, yet, with the additional module intarray - needs to be installed once per database with

CREATE EXTENSION intarray;

See:

(But we can make this work for any data type.)

CREATE OR REPLACE FUNCTION f_update_status(_ids_as_string text = NULL
                                         , _status text = NULL)
   RETURNS text
   LANGUAGE plpgsql AS
$func$
DECLARE
   _ids     int[] := string_to_array(_ids_as_string, ',')::int[];  -- ①
   _updated int[];
BEGIN
   WITH upd AS (
      UPDATE users
      SET    row_status = _status
           , updated_by = 'hxBiSP'
      WHERE  id = ANY (_ids)  -- just use input array
      RETURNING id            -- returns actually affected IDs
      )
   SELECT ARRAY(TABLE upd)
   INTO _updated;  -- ③ aggregate into array with ARRAY constructor

   RETURN format('Updated: %s. Not found: %s.'
               , _updated::text
               , (_ids - _updated)::text);  -- ②
END
$func$;

db<>fiddle here

Or use a VARIADIC function to pass actual integer values or an actual integer array as single argument, like a_horse suggested under your previous question:

Splitting comma separated string in PL/pgSQL function

① Variable can be assigned at declaratin time. And leading or trailing white space is trimmed from integer numbers automatically.

② Using the integer[] - integer[] → integer[] operator provided by intarray.

③ About the ARRAY constructor:

Why is array_agg() slower than the non-aggregate ARRAY() constructor?

How to pass a single array instead of a dictionary of values to a VARIADIC function:

PL/pgSQL - Find out which items of an array exist in the table

2 Answers2