How to replace all subsets of characters based on values of other tables in pl/pgsql?

Question

I've been doing some research on how to replace a subset of string of characters of a single row base on the values of the columns of other rows, but was not able to do so since the update are only for the first row values of the other table. So I'm planning to insert this in a loop in a plpsql function.

Here are the snippet of my tables. Main table:

 Table "public.tbl_main"
        Column         |  Type  | Modifiers 
-----------------------+--------+-----------
 maptarget             | text   | 
 expression            | text   | 


 maptarget |                 expression
-----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 43194-0   | 363787002:70434600=(386053000:704347000=(414237002:704320005=259470008,704318007=118539007,704319004=50863008),704327008=122592007,246501002=703690001,370132008=30766002)

Look-up table:

Table "public.tbl_values"
        Column         |  Type  | Modifiers 
-----------------------+--------+-----------
 conceptid             | bigint | 
 term                  | text   |

 conceptid |                   term                   
-----------+------------------------------------------
 386053000 | Patient evaluation procedure (procedure)
 363787002 | Observable entity (observable entity)
 704347000 | Observes (attribute)
 704320005 | Towards (attribute)
 704318007 | Property type (attribute)

I want to create a function that will replace all numeric values in the tbl_main.expression columns with their corresponding tbl_values.term using the tbl_values.conceptid as the link to each numeric values in the expression string.

I'm stuck currently in the looping part since I'm a newbie in LOOP of plpgsql. Here is the rough draft of my function.

--create first a test table
drop table if exists tbl_test;
create table tbl_test as select * from tbl_main limit 1;
--

create or replace function test () 
 RETURNS SETOF tbl_main
 LANGUAGE plpgsql
AS $function$
declare
 resultItem tbl_main;
 v_mapTarget text;
 v_expression text;
 ctr int;
begin
  v_mapTarget:='';
  v_expression:='';
  ctr:=1;

  for resultItem in (select * from tbl_test) loop
     v_mapTarget:=resultItem.mapTarget;
     select into v_expression expression from ee;
     raise notice 'parameter used: %',v_mapTarget;
     raise notice 'current expression: %',v_expression;

     update ee set expression=replace(v_expression, new_exp::text, term) from (select new_exp::text, term from tbl_values offset ctr limit 1) b ; 
     ctr:=ctr+1;
     raise notice 'counter: %', ctr;
     v_expression:= (select expression from ee);
     resultItem.expression:= v_expression;
     raise notice 'current expression: %',v_expression;
return next resultItem;
 end loop;
 return;
 end;
$function$;

Any further information will be much appreciated.
My Postgres version:

PostgreSQL 9.3.6 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2, 64-bit

I dont have experience in plpgsql but I would do the following: Create something like a temporary table and insert the splitted values of your tbl_main. Than do an simple join with your tbl_values. In case you can change the tables, I would prefer to do that. — Andre, Apr 16 '15 at 07:39
Why do you have `conceptid` 386053000 three times? I'd expect the column to be unique. Also, what is your version of Postgres? — Erwin Brandstetter, Apr 17 '15 at 07:51

score 0 · Accepted Answer · edited May 23 '17 at 12:22

PL/pgSQL function with dynamic SQL

Looping is always a measure of last resort. Even in this case it is substantially cheaper to concatenate a query string using a query, and execute it once:

CREATE OR REPLACE FUNCTION f_make_expression(_expr text, OUT result text) AS
$func$
BEGIN
   EXECUTE (
      SELECT 'SELECT ' || string_agg('replace(', '') || '$1,'
           || string_agg(format('%L,%L)', conceptid::text, v.term), ','
                         ORDER BY conceptid DESC)
      FROM  (
         SELECT conceptid::bigint
         FROM   regexp_split_to_table($1, '\D+') conceptid
         WHERE  conceptid <> ''
         ) m
      JOIN   tbl_values v USING (conceptid)
      )
   USING _expr
   INTO result;
END
$func$ LANGUAGE plpgsql;

Call:

SELECT *, f_make_expression(expression) FROM tbl_main;

However, if not all conceptid have the same number of digits, the operation could be ambiguous. Replace conceptid with more digits first to avoid that - ORDER BY conceptid DESC does that - and make sure that replacement strings do not introduce ambiguity (numbers that might be replaced in the the next step). Related answer with more on these pitfalls:

Replace a string with another string from a list depending on the value

The token $1 is used two different ways here, don't be misled:

```
regexp_split_to_table($1, '\D+')
```
This one references the first function parameter _expr. You could as well use the parameter name.
```
|| '$1,'
```
This concatenates into the SQL string a references to the first expression passed via USING clause to EXECUTE. Parameters of the outer function are not visible inside EXECUTE, you have to pass them explicitly.

It's pure coincidence that $1 (_expr) of the outer function is passed as $1 to EXECUTE. Might as well hand over $7 as third expression in the USING clause ($3) ...

I added a debug function to the fiddle. With a minor modification you can output the generated SQL string to inspect it:

SQL function

Here is a pure SQL alternative. Probably also faster:

CREATE OR REPLACE FUNCTION f_make_expression_sql(_expr text)
  RETURNS text AS
$func$
SELECT string_agg(CASE WHEN $1 ~ '^\d'
                       THEN txt || COALESCE(v.term, t.conceptid) 
                       ELSE COALESCE(v.term, t.conceptid) || txt END
                , '' ORDER BY rn) AS result
FROM  (
   SELECT *, row_number() OVER () AS rn
   FROM  (
      SELECT regexp_split_to_table($1, '\D+') conceptid
           , regexp_split_to_table($1, '\d+') txt
      ) sub
   ) t
LEFT  JOIN tbl_values v ON v.conceptid = NULLIF(t.conceptid, '')::int
$func$  LANGUAGE sql STABLE;

In Postgres 9.4 this can be much more elegant with two new features:

ROWS FROM to replacing the old (weird) technique to sync set-returning functions
WITH ORDINALITY to get row numbers on the fly reliably:
- PostgreSQL unnest() with element number

CREATE OR REPLACE FUNCTION f_make_expression_sql(_expr text)
  RETURNS text AS
$func$
SELECT string_agg(CASE WHEN $1 ~ '^\d'
                       THEN txt || COALESCE(v.term, t.conceptid) 
                       ELSE COALESCE(v.term, t.conceptid) || txt END
                , '' ORDER BY rn) AS result
FROM   ROWS FROM (
          regexp_split_to_table($1, '\D+')
        , regexp_split_to_table($1, '\d+')
       ) WITH ORDINALITY AS t(conceptid, txt, rn)
LEFT   JOIN tbl_values v ON v.conceptid = NULLIF(t.conceptid, '')::int
$func$  LANGUAGE sql STABLE;

SQL Fiddle demonstrating all for Postgres 9.3.

Thanks for this. But its throwing an error when I'm using it. `ERROR: function f_make_expression(text) does not exist LINE 1: SELECT *, f_make_expression(expression) FROM tbl_main...` I did create the function though... `\df f_make_expresson List of functions Schema | Name | Result data type | Argument data types | Type --------+------------------+------------------+-----------------------------+-------- public | f_make_expresson | text | _expr text, OUT result text | normal ` — Taygz, Apr 20 '15 at 07:41
@Erwin: `f_make_expresson` <-> `f_make_expression`. I had a typo that you copied. — Erwin Brandstetter, Apr 20 '15 at 07:53
Oops! my bad. I already fixed that part but now I'm having trouble with the parameter the function is throwing. It always gives me this error when I'm running the function but when I run the query manually, it works. `ERROR: query string argument of EXECUTE is null` — Taygz, Apr 20 '15 at 09:40
@Erwin: This should not be. One of your operands in the string concatenation must be NULL, which nullifies the whole query string. You might debug using the [`concat()` function](http://stackoverflow.com/questions/12310986/combine-two-columns-and-add-into-one-new-column/12320369#12320369) instead, but ultimately you must find and eliminate the NULL value. I added an improvement to the function (which cannot explain the error you got). Either way, check out the fiddle: it works in Postgres 9.3. — Erwin Brandstetter, Apr 20 '15 at 23:03
@Erwin: Also, I had another idea and added a pure SQL solution. — Erwin Brandstetter, Apr 21 '15 at 06:15

score 0 · Answer 2 · answered Sep 17 '15 at 14:40

There's also another way, without creating functions... using "WITH RECURSIVE". Used it with lookup talbe of thousands of rows.

You'll need to change following table names and columns to your names:

tbl_main, strsourcetext, strreplacedtext;

lookuptable, strreplacefrom, strreplaceto.

WITH RECURSIVE replaced AS (
(SELECT
strsourcetext,
strreplacedtext,
array_agg(strreplacefrom ORDER BY length(strreplacefrom) DESC, strreplacefrom, strreplaceto) AS arrreplacefrom,
array_agg(strreplaceto ORDER BY length(strreplacefrom) DESC, strreplacefrom, strreplaceto) AS arrreplaceto,
count(1) AS intcount,
1 AS intindex
FROM tbl_main, lookuptable WHERE tbl_main.strsourcetext LIKE '%' || strreplacefrom || '%'
GROUP BY strsourcetext)
UNION ALL 
SELECT
strsourcetext,
replace(strreplacedtext, arrreplacefrom[intindex], arrreplaceto[intindex]) AS strreplacedtext,
arrreplacefrom,
arrreplaceto,
intcount,
intindex+1 AS intindex
FROM replaced WHERE intindex<=intcount
)
SELECT strsourcetext,
(array_agg(strreplacedtext ORDER BY intindex DESC))[1] AS strreplacedtext
FROM replaced 
GROUP BY strsourcetext

How to replace all subsets of characters based on values of other tables in pl/pgsql?

2 Answers2

PL/pgSQL function with dynamic SQL

SQL function

Linked