1

I have an INSERT ... ON CONFLICT (primary_key) DO UPDATE query in a PostgreSQL 12 database.

It requires me to do the same calculation for multiple columns in the update clause. The calculation isn't too complex, but I would prefer not to maintain it in two places. The calculation contains a reference to both the conflicting row and to the EXCLUDED virtual table.

Is there any way to perform this calculation once and assign its result to a temporary variable within a single query expression?

The full query is:

INSERT INTO table_name(id, refreshed_at, tokens) VALUES ('id', CURRENT_TIMESTAMP, 60)
    ON CONFLICT (id) DO UPDATE SET
        tokens = GREATEST(
          -1, 
          LEAST(
            GREATEST(0, table_name.tokens) 
              + EXTRACT(epoch FROM EXCLUDED.refreshed_at) 
              - EXTRACT(epoch FROM table_name.refreshed_at), 
            100
          ) - 1
        ),
        refreshed_at = CASE 
          WHEN (
            GREATEST(0, table_name.tokens) 
              + EXTRACT(epoch FROM EXCLUDED.refreshed_at) 
              - EXTRACT(epoch FROM table_name.refreshed_at)
            ) > 0 THEN EXCLUDED.refreshed_at 
          ELSE table_name.refreshed_at 
          END
        RETURNING tokens >= 0;

The tokens column is calculated from the column's prior value the difference in seconds between the time of query and the last time the column was updated. The refreshed_at column should only be changed if the tokens column value was updated, so the same calculation has to be performed in both SET clauses.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
giaour
  • 3,878
  • 2
  • 25
  • 27

1 Answers1

1

The refreshed_at column should only be changed if the tokens column value was updated

Your original idea should work with a subquery:

INSERT INTO table_name(id, refreshed_at, tokens)
VALUES ('id', CURRENT_TIMESTAMP, 60)
ON CONFLICT (id) DO UPDATE
SET    (tokens, refreshed_at)
     = (SELECT GREATEST(-1, LEAST(GREATEST(0, table_name.tokens) + sub.diff, 100) - 1)
             , CASE WHEN sub.diff > 0 THEN EXCLUDED.refreshed_at ELSE table_name.refreshed_at END
        FROM  (SELECT EXTRACT(epoch FROM EXCLUDED.refreshed_at - table_name.refreshed_at)::int) sub(diff))
RETURNING tokens >= 0;

That's an allowed variant of the UPDATE syntax. This way, the difference between timestamps in seconds is only calculated once.

The calculation is cheaper, too, extracting the epoch from the difference (the resulting interval):

EXTRACT(epoch FROM EXCLUDED.refreshed_at - table_name.refreshed_at)

instead of extracting twice and subtracting like in your original:

EXTRACT(epoch FROM EXCLUDED.refreshed_at) - EXTRACT(epoch FROM table_name.refreshed_at)

But the overhead from the added subquery may be more expensive than doing the cheap calculation twice!

Alternative solution

Consider adding a WHERE clause to the UPDATE part instead. Like:

INSERT INTO table_name(id, refreshed_at, tokens)
VALUES ('id', CURRENT_TIMESTAMP, 60)
ON CONFLICT (id) DO UPDATE
SET     tokens = GREATEST(-1, LEAST(GREATEST(0, table_name.tokens) + EXTRACT(epoch FROM EXCLUDED.refreshed_at - table_name.refreshed_at), 100) - 1)
      , refreshed_at = EXCLUDED.refreshed_at
WHERE   EXTRACT(epoch FROM EXCLUDED.refreshed_at - table_name.refreshed_at)::int > 0
RETURNING tokens >= 0;

That would not touch rows at all where the time difference is below 1 second. That's typically superior because empty updates just add cost and bloat. (The cast rounds, you may want to truncate instead?)

Of course that would also not return a row where neither the INSERT nor the UPDATE part went through. If that's a problem, consider:

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228