4

I am trying to create a function that takes a table and a variable number of columns as arguments, and then returns a table without rows that have duplicates on all of those columns. I am trying to figure out how to have a variable number of columns as arguments, and I have gathered that I will probably need a VARIADIC argument, but I am not sure how to implement it. What I have so far:

CREATE FUNCTION remove_duplicates(orig_table, VARIADIC sel_columns column)
RETURNS table AS $$
    SELECT * FROM 
        (SELECT *,
            count(*) over (partition by sel_columns) AS count
        FROM orig_table)
    WHERE count = 1;
$$ LANGUAGE SQL;

As an example, if I had a table like this:

cola | colb | colc
-------------------
a    | b    | 1
a    | b    | 2
a    | c    | 3
a    | d    | 4

I would like to run SELECT * FROM remove_duplicates(mytable, cola, colb) and get this result:

cola | colb | colc
-------------------
a    | c    | 3
a    | d    | 4

Thank you for the help. I'm using postgresql 9.4.9

Amadou Kone
  • 907
  • 11
  • 21

1 Answers1

2

You'll cannot get what you want with a simple SQL function, you need the power of a procedural language. A possible solution is:

CREATE OR REPLACE FUNCTION remove_duplicates(orig_table anyelement, VARIADIC sel_columns text[])
RETURNS SETOF anyelement AS $$
DECLARE
    orig_table_columns TEXT;
BEGIN
    SELECT array_to_string(array_agg(quote_ident(column_name)),',') INTO orig_table_columns FROM information_schema.columns WHERE table_name = CAST(pg_typeof(orig_table) AS TEXT);
    RETURN QUERY EXECUTE 'SELECT ' || orig_table_columns || ' FROM '
        || '(SELECT *, '
        || '    count(*) over (partition by ' || array_to_string(sel_columns, ',') || ') AS count '
        || 'FROM ' || pg_typeof(orig_table) || ') AS tmp '
        || ' WHERE count = 1 ';
END
$$ LANGUAGE PLPGSQL;

SELECT * FROM remove_duplicates(NULL::tests, 'cola', 'colb');

Don't forget to do your changes to avoid SQL Injection.

EDIT: For a very good explanation about functions with dynamic return types see Erwin's answer here.

Community
  • 1
  • 1
Michel Milezzi
  • 10,087
  • 3
  • 21
  • 36
  • Thanks, this works! I'm a little confused as to why it's necessary to make `orig_table` type `anyelement`, and why the table argument has to be `NULL` that is cast to a specific table. – Amadou Kone May 16 '17 at 19:16
  • @AmadouKone anyelement because we're taking advantage of [Polymorphic Types](https://www.postgresql.org/docs/current/static/extend-type-system.html#EXTEND-TYPES-POLYMORPHIC). Casting to null is just a trick to pass the type of desired table. – Michel Milezzi May 16 '17 at 19:32
  • @AmadouKone Also note the return type is anyelement too. This makes the function polymorphic. – Michel Milezzi May 16 '17 at 19:39