1

I have created and automated a custom workflow for eagerly synchronizing a series of materialized views. After trying several different approaches (for one to many relationships), I have found the most reliable synchronization workflow is to delete all records that may have been impacted and then insert the new records.

DELETE FROM
    some_materialized_view
WHERE
    set_of_records_key = some_value;

INSERT INTO
    some_materialized_view
SELECT
    *
FROM
    some_query_generating_some_materialized_view;

Note: some_query_generating_some_materialized_view is a complex read operation that takes a non-trivial amount of resources to execute. Additionally, some_materialized_view is heavily indexed with several foreign keys and other constraints.

This feels extremely heavy handed. This workflow comes with excessive delete and insert operations that are often times needless as some of the deleted records may have been identical, or similar enough to be a candidate for an UPDATE.

I would prefer something like the following:

DELETE FROM
    some_materialized_view
USING
    (
        SELECT
            unique_key
        FROM
            some_materialized_view
        WHERE
            set_of_records_key = some_value

        EXCEPT
        INSERT INTO
            some_materialized_view
        SELECT
            *
        FROM
            some_query_generating_some_materialized_view
        ON CONFLICT (...) DO UPDATE
        SET 
            foo = EXCLUDED.foo,
            bar = EXCLUDED.bar,
            ...
        WHERE
            some_materialized_view <> EXCLUDED
        RETURNING
            unique_key
    ) AS sub_query
WHERE
    some_materialized_view.unique_key = sub_query.unique_key;

The problem is in the ON CONFLICT ... DO UPDATE ... WHERE ... RETURNING clause.

as addressed in this question: How to use RETURNING with ON CONFLICT in PostgreSQL?

the RETURNING clause only returns impacted records. So records not impacted are not returned, and thus (in the example above) deleted inappropriately.

It seems the only way to get RETURNING to actually return all records is unnecessarily update identical records by removing the WHERE some_materialized_view <> EXCLUDED clause, or run some_query_generating_some_materialized_view again in another EXCEPT clause... both options are also not ideal.

So, what am I missing? Are there other options available? If not, in general, is it preferred to perform a complex, resource intensive, read operation over a needless UPDATE (remembering the associated index maintenance and check constraints)?

Note: I'm not including EXPLAIN ANALYZE results as this is not specific to a single query, rather a question in general. For the sake of maintainability and sanity this project needs to be consistent and this technique is used several times with tables of different structures and use cases (some read heavy, others write heavy).

Community
  • 1
  • 1
losthorse
  • 1,530
  • 1
  • 13
  • 33
  • Unclear. your first query appears to delete only as single row, based on a single PK value: `DELETE FROM some_materialized_view WHERE primary_key = some_value; ...` ; the second one seems to delete a complete *set* of records. Hint: have you attempted a *chained* CTE, using `RETURNING` ? – joop Dec 27 '16 at 17:29
  • @joop - my mistake, I have corrected the offending code. Also, yes; I have use chained CTE's with varying degrees of success. I generally stay away from them due to the overhead (although I love the readability of them). – losthorse Dec 27 '16 at 17:36
  • Note `some_query_generating_some_materialized_view` (the workhorse part, IIUC) is executed exactly once in both cases, so there should be not much differerence *performancewise* between the two methods. It would also make it a candidate for a CTE (does its outcome depend on the other parts of the query ? I suspect it does)) – joop Dec 27 '16 at 17:48
  • @joop - correct; however, the second query does not actually work (for several reasons) ... i'm thinking of adding a `before update` trigger to `some_materialized_view WHERE OLD IS NOT DISTINCT FROM NEW` that aborts the operation. This could be the best of both worlds. – losthorse Dec 27 '16 at 17:55
  • *Follow up to previous comment: aborting the update operation with a before update trigger also removes the record from the `RETURNING` clause. – losthorse Dec 27 '16 at 19:18

1 Answers1

0
  • Pseudo code (I don't like pseudo code)
  • demonstrating chained CTE
  • ignoring locking,serialisation and races.
  • (and semantics/context from the question)

WITH fresh AS ( -- workhorse: Called only and exactly once
        SELECT <keyfields> -- *
        FROM some_query_generating_some_materialized_view
        )
,upd AS ( -- update existing rows
        UPDATE some_materialized_view mv
        SET foo = fr.foo, bar = fr.bar
        FROM fresh fr
        WHERE mv.<keyfields = fr.<keyfields>
        RETURNING mv.<keyfields>
        )
/* 
, del AS ( 
        no deletes ???
        )
 */
        -- insert non existing rows.
INSERT INTO some_materialized_view mv ( <targetfields> )
SELECT fr.<srcfields>
FROM fresh fr
WHERE NOT EXISTS (
        SELECT *
        FROM upd nx
        WHERE nx.<keyfields> = fr.<keyfields>
        );
joop
  • 4,330
  • 1
  • 15
  • 26