Apply `trim()` and `regexp_replace()` on text array

Question

I am trying to transform a PostgreSQL text array into one where each value is replaced by one where each white space is surrounded by a character on each side. In other words, I am trying to apply trim() and regexp_replace() on each value in a text array. This is all done (among other things) inside a database function.

    CREATE OR REPLACE FUNCTION manipulate_array(multiplevalues text[])
    RETURNS text[] AS 
    $BODY$
        DECLARE 
            singlevalue text;

        BEGIN
            FOREACH singlevalue IN ARRAY multiplevalues LOOP
                SELECT trim(regexp_replace(singlevalue, '\s+', ' ', 'g')) INTO singlevalue;
            END LOOP;

            RETURN multiplevalues;
        END;
    $BODY$
    LANGUAGE plpgsql VOLATILE
    COST 100;
    ALTER FUNCTION manipulate_array(multiplevalues text[]) OWNER TO username;

Unfortunately, when I call the function with multiplevalues = '{" red ", " blue ", " Ye llow "}' as one of the arguments the returned value is the exact same text array. How do I get '{"red", "blue", "yellow"}' as a return value?

I have been staring at the definitions for trim(), regexp_replace() and the FOREACH loop for a while now, and I might just need a check from someone else.

score 1 · Accepted Answer · 2016-08-30T20:17:50.630

Your code never changes the multiplevalues array. It just changes each element, then throws that new value away.

You need a variable where you can aggregate your results into:

CREATE OR REPLACE FUNCTION manipulate_array(multiplevalues text[])
RETURNS text[] AS 
$BODY$
  DECLARE 
    singlevalue text;
    l_result text[] := '{}'::text[]; -- initialize with an empty array
  BEGIN
    FOREACH singlevalue IN ARRAY multiplevalues LOOP
        SELECT trim(regexp_replace(singlevalue, '\s+', ' ', 'g')) INTO singlevalue;
        l_result := l_result || singlevalue; -- append to the result
    END LOOP;

    RETURN l_result; -- return the new array, not the old one
  END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;

But this can be done a bit simpler using a unnest and array_agg and a plain SQL function (rather then PL/pgSQL)

You need to first unnest the array, trim the values and the aggregate that back into an array.

I am not sure I understand what you are trying to do, but this will trim all values inside the array and return a new one:

create function trim_all(p_values text[])
  returns text[]
as
$$
  select array_agg(trim(regexp_replace(t.v, '\s+', ' ', 'g')) order by t.nr)
    from unnest(p_values) with ordinality as t(v, nr);
$$
language sql;

Thanks for both your first and second answer. This works (and makes a lot of sense)! — tedvangageldonk, Aug 30 '16 at 20:21

Erwin Brandstetter · Answer 2 · 2023-04-17T02:57:34.290

A couple more improvements:

CREATE FUNCTION trim_all1(text[])
  RETURNS text[]
  LANGUAGE sql IMMUTABLE PARALLEL SAFE AS
$func$
SELECT ARRAY (
   SELECT regexp_replace(trim(elem), '\s+', ' ', 'g')
   FROM   unnest($1) elem
   )
$func$;

It's typically faster to apply the cheap trim() before the expensive regexp_replace().

A simple array constructor is faster than array_agg() to aggregate a single array.

Make the function IMMUTABLE (because it is). Faster in certain contexts and can be used in indexes, etc. See:

How do IMMUTABLE, STABLE and VOLATILE keywords effect behaviour of function?

Add PARALLEL SAFE in Postgres 9.6 or later, because the function qualifies.

If you are dealing with plain space characters exclusively, further simplify:

CREATE FUNCTION trim_all2(text[])
  RETURNS text[]
  LANGUAGE sql IMMUTABLE PARALLEL SAFE AS
$func$
SELECT ARRAY (
   SELECT regexp_replace(trim(elem), '  +', ' ', 'g')
   FROM   unnest($1) elem
   )
$func$;

Regular expressions are comparatively expensive. If you further know that there are never more than n consecutive space characters, you can make do with simple replace(). A couple of nested calls are still cheaper. For max. 8 consecutive spaces:

CREATE OR REPLACE FUNCTION trim_all3(text[])
  RETURNS text[]
  LANGUAGE sql IMMUTABLE PARALLEL SAFE AS
$func$
SELECT ARRAY (
   SELECT replace(replace(replace(
             trim(elem)
           , '     ', ' ')     -- 5   
           , '   '  , ' ')     -- 3
           , '  '   , ' ')     -- 2
   FROM   unnest($1) elem
   )
$func$;

In my tests on pg 9.5 each is faster than the previous version and the last one is more than twice as fast as @a_horse's function.

Find multiple strings and replace with single string in Postgres

Does it retain the original order of elements?

The SQL standard would allow any order without explicit ORDER BY. But in this simple arrangement, Postgres will keep the original order of elements. See:

PostgreSQL unnest() with element number

Interesting. Does the `array(...)` version retain the ordering of the elements in the array? — , Aug 31 '16 at 05:45
@a_horse_with_no_name: Yes, but that's not warranted by the SQL standard. It's an implementation detail - that's not going to change for the simple case. — Erwin Brandstetter, Aug 31 '16 at 10:07
The reason I chose the `array_agg()` was because I could retain the order of the original entries in the array through the `ordinality` and an `order by`. — , Aug 31 '16 at 10:10
@a_horse_with_no_name: You can do the same with an array constructor. I only chose to drop it, because we can, and it saves a bit of time. Adding `ORDER BY` is cleaner either way. The OP also did not explicitly specify whether the order of elements needs to be preserved. — Erwin Brandstetter, Aug 31 '16 at 10:14

Apply `trim()` and `regexp_replace()` on text array

2 Answers2

Does it retain the original order of elements?

Linked