1

Explanation:

Lets say we have a table:

userid points
123 1
456 1

Both userid and points is of type int or any other numeric data type. And userid is my PK.
Now, in my table I would like to perform an update query, and if the row does not exist I would like to insert the row. If the user already exists I would like to increment the points by 1, otherwise insert the userid and points to be 1 as default.

I am aware I can perform an upsert like this:

INSERT INTO table(userid, points) VALUES(123, 1)
ON conflict (userid)
DO UPDATE
SET points = table.points + 1 
where table.userid = 123;

However, in my case update operation is more frequent than inserting a new row. Lets say there are 5000 queries each day and around 4500 of those rows are UPDATE operations on already existing rows. Doing an opposite of upsert would be more beneficial since the conflict will reduce to 500 times instead of 4500. I would like to try to UPDATE first and if it returns UPDATE 0 I would like to perform an INSERT.

Is it possible to do the above using RETURNING or FOUND or something else in a single query? Or is the benefit for the above if possible is too insignificant and upsert is the way to go?

A simple representation of what I want to do using python and asyncpg (2 queries):

import asyncio
import asyncpg

async def run():
    conn = await asyncpg.connect(user='user', password='password',
                                 database='database')

    output = await conn.execute("UPDATE table set points = points + 1 where userid = $1", 123)

    if output == "UPDATE 0":
        await conn.execute("INSERT INTO table(userid, points) values($1, $2)", 123, 0)

    await conn.close()

loop = asyncio.get_event_loop()
loop.run_until_complete(run())

Questions I already have checked:

  • This is the most similar to my question for mysql but unfortunately does not have an answer.
  • This kind of works but still uses 2 separate queries and rely upon one query failing/ignoring.
  • I also have read This, This, This, This, This and This but they dont answer my questions.
Uwe Keim
  • 39,551
  • 56
  • 175
  • 291
Rockboy987
  • 131
  • 1
  • 6

1 Answers1

2

Your proposed code has a race condition: someone could insert a row between the UPDATE and the INSERT, making both fail. The only safe technique is an endless loop that tries both statements until one of them succeeds.

Since every statement requires a client-server round trip, I doubt that your code will perform better than INSERT ... ON CONFLICT.

Rather than making an unfounded assumption that INSERT ... ON CONFLICT is much slower than UPDATE, you should benchmark both solutions.

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263
  • 3
    A while ago I did indeed ran benchmarks (using JMeter) with a lot of concurrent threads than all ran `INSERT ON CONFLICT` statements. When I set it up so that every insert was turned into an update, this performed essentially the same as if I had run only updates in the first place. The same was true, if all inserts were true inserts. And thinking about it, it doesn't really surprise me. In any case Postgres needs to lookup if a row exists and the resulting physical update or insert isn't different. –  Mar 04 '21 at 07:40