In UPDATE part of UPSERT, compute expression once for multiple columns

Question

I have an INSERT ... ON CONFLICT (primary_key) DO UPDATE query in a PostgreSQL 12 database.

It requires me to do the same calculation for multiple columns in the update clause. The calculation isn't too complex, but I would prefer not to maintain it in two places. The calculation contains a reference to both the conflicting row and to the EXCLUDED virtual table.

Is there any way to perform this calculation once and assign its result to a temporary variable within a single query expression?

The full query is:

INSERT INTO table_name(id, refreshed_at, tokens) VALUES ('id', CURRENT_TIMESTAMP, 60)
    ON CONFLICT (id) DO UPDATE SET
        tokens = GREATEST(
          -1, 
          LEAST(
            GREATEST(0, table_name.tokens) 
              + EXTRACT(epoch FROM EXCLUDED.refreshed_at) 
              - EXTRACT(epoch FROM table_name.refreshed_at), 
            100
          ) - 1
        ),
        refreshed_at = CASE 
          WHEN (
            GREATEST(0, table_name.tokens) 
              + EXTRACT(epoch FROM EXCLUDED.refreshed_at) 
              - EXTRACT(epoch FROM table_name.refreshed_at)
            ) > 0 THEN EXCLUDED.refreshed_at 
          ELSE table_name.refreshed_at 
          END
        RETURNING tokens >= 0;

The tokens column is calculated from the column's prior value the difference in seconds between the time of query and the last time the column was updated. The refreshed_at column should only be changed if the tokens column value was updated, so the same calculation has to be performed in both SET clauses.

Just show the statement (even if it's not working), and your version of Postgres. — Erwin Brandstetter, Apr 30 '21 at 22:19
If the calculation contains the conflicting row, how will you reuse the result of one calculation? Perhaps a function makes more sense? — Schwern, Apr 30 '21 at 22:31
@ErwinBrandstetter Added. The query works fine but I would prefer not repeating it in case it needs to be updated later. — giaour, Apr 30 '21 at 22:39
Parentheses don't seem to match. Also: data type of `refreshed_at`? — Erwin Brandstetter, Apr 30 '21 at 22:46
`refreshed_at` is a timestamp, `tokens` is an integer, `id` is text. I did lightly edit the query before pasting it in, so the parens may be off. — giaour, Apr 30 '21 at 22:51

Erwin Brandstetter · Accepted Answer · 2021-04-30T23:31:20.757

The refreshed_at column should only be changed if the tokens column value was updated

Your original idea should work with a subquery:

INSERT INTO table_name(id, refreshed_at, tokens)
VALUES ('id', CURRENT_TIMESTAMP, 60)
ON CONFLICT (id) DO UPDATE
SET    (tokens, refreshed_at)
     = (SELECT GREATEST(-1, LEAST(GREATEST(0, table_name.tokens) + sub.diff, 100) - 1)
             , CASE WHEN sub.diff > 0 THEN EXCLUDED.refreshed_at ELSE table_name.refreshed_at END
        FROM  (SELECT EXTRACT(epoch FROM EXCLUDED.refreshed_at - table_name.refreshed_at)::int) sub(diff))
RETURNING tokens >= 0;

That's an allowed variant of the UPDATE syntax. This way, the difference between timestamps in seconds is only calculated once.

The calculation is cheaper, too, extracting the epoch from the difference (the resulting interval):

EXTRACT(epoch FROM EXCLUDED.refreshed_at - table_name.refreshed_at)

instead of extracting twice and subtracting like in your original:

EXTRACT(epoch FROM EXCLUDED.refreshed_at) - EXTRACT(epoch FROM table_name.refreshed_at)

But the overhead from the added subquery may be more expensive than doing the cheap calculation twice!

Alternative solution

Consider adding a WHERE clause to the UPDATE part instead. Like:

INSERT INTO table_name(id, refreshed_at, tokens)
VALUES ('id', CURRENT_TIMESTAMP, 60)
ON CONFLICT (id) DO UPDATE
SET     tokens = GREATEST(-1, LEAST(GREATEST(0, table_name.tokens) + EXTRACT(epoch FROM EXCLUDED.refreshed_at - table_name.refreshed_at), 100) - 1)
      , refreshed_at = EXCLUDED.refreshed_at
WHERE   EXTRACT(epoch FROM EXCLUDED.refreshed_at - table_name.refreshed_at)::int > 0
RETURNING tokens >= 0;

That would not touch rows at all where the time difference is below 1 second. That's typically superior because empty updates just add cost and bloat. (The cast rounds, you may want to truncate instead?)

Of course that would also not return a row where neither the INSERT nor the UPDATE part went through. If that's a problem, consider:

How to use RETURNING with ON CONFLICT in PostgreSQL?

In UPDATE part of UPSERT, compute expression once for multiple columns

1 Answers1

Alternative solution