1

I am trying to remove duplicated data from some of our databases based upon unique id's. All deleted data should be stored in a separate table for auditing purposes. Since it concerns quite some databases and different schemas and tables I wanted to start using variables to reduce chance of errors and the amount of work it will take me.

This is the best example query I could think off, but it doesn't work:

do $$
declare @source_schema  varchar := 'my_source_schema';
declare @source_table   varchar := 'my_source_table';
declare @target_table   varchar := 'my_target_schema' || source_table || '_duplicates'; --target schema and appendix are always the same, source_table is a variable input.
declare @unique_keys    varchar := ('1', '2', '3') 

begin 
select into @target_table
from @source_schema.@source_table
where id in (@unique_keys);

delete from @source_schema.@source_table where export_id in (@unique_keys);

end ;
$$;

The query syntax works with hard-coded values.

Most of the times my variables are perceived as columns or not recognized at all. :(

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Bobstar
  • 137
  • 13
  • 2
    `@` is invalid in an identifier in SQL and also not valid for variables in PL/pgSQL. Additionally: `declare` starts a _block_ to declare one or more variables. There is no need to start a new declare block for each variable. Also: the standard compliant `create table new_table as select ...` is preferred over the non-standard `select into new_table from ...` –  Jan 17 '22 at 11:41
  • You need [dynamic SQL](https://www.postgresql.org/docs/current/plpgsql-statements.html#PLPGSQL-STATEMENTS-EXECUTING-DYN) for this –  Jan 17 '22 at 11:41
  • 1
    And a link to the manual about plpgsql: https://www.postgresql.org/docs/current/plpgsql.html – Frank Heikens Jan 17 '22 at 13:17

3 Answers3

1

You need to create and then call a plpgsql procedure with input parameters :

CREATE OR REPLACE PROCEDURE duplicates_suppress
(my_target_schema text, my_source_schema text, my_source_table text, unique_keys text[])
LANGUAGE plpgsql AS
$$
BEGIN

EXECUTE FORMAT(
'WITH list AS (INSERT INTO %1$I.%3$I_duplicates SELECT * FROM %2$I.%3$I WHERE array[id] <@ %4$L :: integer[] RETURNING id)
DELETE FROM %2$I.%3$I AS t USING list AS l WHERE t.id = l.id', my_target_schema, my_source_schema, my_source_table, unique_keys :: text) ;

END ;
$$ ;

The procedure duplicates_suppress inserts into my_target_schema.my_source_table || '_duplicates' the rows from my_source_schema.my_source_table whose id is in the array unique_keys and then deletes these rows from the table my_source_schema.my_source_table .

See the test result in dbfiddle.

Edouard
  • 6,577
  • 1
  • 9
  • 20
  • Thanks @Eduard. A little though for me to directly understand, but I scheduled more time for this next week. Hopefully I can figure out how the above works and get it to work – Bobstar Jan 19 '22 at 19:03
0

As has been commented, you need some kind of dynamic SQL. In a FUNCTION, PROCEDURE or a DO statement to do it on the server.

You should be comfortable with PL/pgSQL. Dynamic SQL is no beginners' toy.

Example with a PROCEDURE, like Edouard already suggested. You'll need a FUNCTION instead to wrap it in an outer transaction (like you very well might). See:

CREATE OR REPLACE PROCEDURE pg_temp.f_archive_dupes(_source_schema text, _source_table text, _unique_keys int[], OUT _row_count int)
  LANGUAGE plpgsql AS
$proc$
   -- target schema and appendix are always the same, source_table is a variable input
DECLARE
   _target_schema CONSTANT text := 's2';  -- hardcoded
   _target_table  text := _source_table || '_duplicates';
   _sql           text := format(
'WITH del AS (
   DELETE FROM %I.%I
   WHERE  id = ANY($1)
   RETURNING *
   )
INSERT INTO %I.%I TABLE del', _source_schema, _source_table
                            , _target_schema, _target_table);
BEGIN
   RAISE NOTICE '%', _sql;           -- debug
   EXECUTE _sql USING _unique_keys;  -- execute

   GET DIAGNOSTICS _row_count = ROW_COUNT;
END
$proc$;

Call:

CALL pg_temp.f_archive_dupes('s1', 't1', '{1, 3}', 0);

db<>fiddle here

I made the procedure temporary, since I assume you don't need to keep it permanently. Create it once per database. See:

Passed schema and table names are case-sensitive strings! (Unlike unquoted identifiers in plain SQL.) Either way, be wary of SQL-injection when concatenating SQL dynamically. See:

Made _unique_keys type int[] (array of integer) since your sample values look like integers. Use a the actual data type of your id columns!

The variable _sql holds the query string, so it can easily be debugged before actually executing. Using RAISE NOTICE '%', _sql; for that purpose.
I suggest to comment the EXECUTE line until you are sure.

I made the PROCEDURE return the number of processed rows. You didn't ask for that, but it's typically convenient. At hardly any cost. See:

Last, but not least, use DELETE ... RETURNING * in a data-modifying CTE. Since that has to find rows only once it comes at about half the cost of separate SELECT and DELETE. And it's perfectly safe. If anything goes wrong, the whole transaction is rolled back anyway.
Two separate commands can also run into concurrency issues or race conditions which are ruled out this way, as DELETE implicitly locks the rows to delete. Example:


Or you can build the statements in a client program. Like psql, and use \gexec. Example:

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
0

Based on Erwin's answer, minor optimization...

create or replace procedure pg_temp.p_archive_dump
    (_source_schema text, _source_table text,
        _unique_key int[],_target_schema text)
language plpgsql as
    $$
    declare
        _row_count bigint;
        _target_table text := '';
    BEGIN
        select quote_ident(_source_table) ||'_'|| array_to_string(_unique_key,'_')   into _target_table from quote_ident(_source_table);
        raise notice 'the deleted table records will store in %.%',_target_schema, _target_table;
        execute format('create table %I.%I as select * from %I.%I limit 0',_target_schema, _target_table,_source_schema,_source_table );

        execute format('with mm as ( delete from %I.%I where id = any (%L) returning * ) insert into %I.%I table mm'

            ,_source_schema,_source_table,_unique_key, _target_schema, _target_table);
    GET DIAGNOSTICS _row_count = ROW_COUNT;
    RAISE notice 'rows influenced, %',_row_count;
    end
$$;

-- if your _unique_key is not that much, this solution also create a table for you. Obviously you need to create the target schema yourself.
If your unique_key is too much, you can customize to properly rename the dumped table.
Let's call it.
call pg_temp.p_archive_dump('s1','t1', '{1,2}','s2');
s1 is the source schema, t1 is source table, {1,2} is the unique key you want to extract to the new table. s2 is the target schema

jian
  • 4,119
  • 1
  • 17
  • 32