4

I have created a "merge" function which is supposed to execute either an UPDATE or an INSERT query, depending on existing data. Instead of writing an upsert-wrapper for each table (as in most of the available examples), this function takes entire SQL strings. Both of the SQL strings are automatically generated by our application.

The plan is to call the function like this:

-- hypothetical "settings" table, with a primary key of (user_id, setting):

SELECT merge(
    $$UPDATE settings SET value = 'x' WHERE user_id = 42 AND setting = 'foo'$$,
    $$INSERT INTO settings (user_id, setting, value) VALUES (42, 'foo', 'x')$$
);

Here's the full code of the merge() function:

CREATE OR REPLACE FUNCTION merge (update_sql TEXT, insert_sql TEXT) RETURNS TEXT AS
$func$
DECLARE
    max_iterations INTEGER := 10;
    i INTEGER := 0;
    num_updated INTEGER;
BEGIN
    -- usually returns before re-entering the loop
    LOOP

        -- first try the update
        EXECUTE update_sql;
        GET DIAGNOSTICS num_updated = ROW_COUNT;
        IF num_updated > 0 THEN
            RETURN 'UPDATE';
        END IF;

        -- nothing was updated: try the insert, watching out for concurrent inserts
        BEGIN
            EXECUTE insert_sql;
            RETURN 'INSERT';
        EXCEPTION WHEN unique_violation THEN
            -- nop; just loop and try again from the top
        END;

        -- emergency brake
        i := i + 1;
        IF i >= max_iterations THEN
            RAISE EXCEPTION 'merge(): tried looping % times, giving up now.', i;
            EXIT;
        END IF;

    END LOOP;
END;
$func$
LANGUAGE plpgsql;

It appears to work well enough in my tests, but I'm not certain if I haven't missed anything crucial, especially regarding concurrent UPDATE/INSERT/DELETE queries, which may be issued without using this function. Did I overlook anything important?

Among the resources I consulted for this function are:

(Edit: one of the goals was to avoid locking the target table.)

Community
  • 1
  • 1
Zilk
  • 8,917
  • 7
  • 36
  • 44

1 Answers1

1

The answer to your question depends your the context of how your application(s) will access the database. There are many ways to solve this as nicely discussed in depesz's post you cited by yourself. In addition you might want to also consider using writeable CTEs see here. Also the [question]Insert, on duplicate update in PostgreSQL? has some interesting discussions for your decision making process.

Community
  • 1
  • 1
renzo
  • 135
  • 1
  • 2
  • 10
  • I did consider writeable CTEs, but as the blog you linked mentions, there's a race condition with concurrent inserts. The point of the `merge()` function was to "always succeed, eventually", as far as that is possible. Granted, passing SQL strings to the function is not very elegant, but a) I have yet to see a comparable solution and b) the INSERT/UPDATE queries are auto-generated, which effectively hides the ugliness behind an API. – Zilk Sep 27 '13 at 13:59