running insert on conflict when most queries will result in a conflict

Question

In my postgres database, I have a table that is used to map two fields together with a serial primary key, but I will rarely get new mappings. I may get data sent to me 60k times per day (or more) and only get new mappings a handful of times every month. I wrote a query with the on conflict clause:

insert into mapping_table
values
(field1, field2)
on conflict (field1, field2) do nothing

This seems to work as expected! But I'm wondering if running the insert query tens of thousands of times per day when a mapping rarely needs to be added is problematic somehow? And if so, is there a better way to do that?

I'm also not sure if there's a problem with the serial primary key. As expected, the value auto increments even though there is a conflict. I don't have any logic that expects that the primary key won't have gaps, but the numbers will get large very fast, and I'm not sure if that could become a problem in the future.

"*and I'm not sure if that could become a problem in the future*" - if you burn 10000 values per second, 24/7 without any pause, it will take 9223372036854775807 years until the sequence is exhausted — , Sep 24 '20 at 20:30
Or 98 years, for 60k a day, with a regular (integer) SERIAL. — jarlh, Sep 24 '20 at 20:41

score 1 · Answer 1 · answered Sep 24 '20 at 20:26

There are 86,400 seconds in a day. So, 60k times per day is less than once per second.

Determining the conflict is a lookup in a unique index, and then perhaps some additional checking for locking and so on. Under most circumstances, this would be an acceptable load on the system.

score 1 · Answer 2 · answered Sep 24 '20 at 20:27

on conflict seems like the right tool for this. It is built-in, probably fairly optimized, and implements the logic you want with a straight-forward syntax.

If you want to avoid "burning" sequence numbers though, an alternative is not exists, to check if a row exists before inserting it:

insert into mapping_table (field1, field2)
select *
from (values (field1, field2)) v
where not exists (
    select 1 
    from mapping_table mp 
    where mp.field1 = v.field1 and mp.field2 = v.field2
)

This query would take advantage of the already existing index on (field1, field2). It will most likely not be faster than on conflict though, and probably slower.

running insert on conflict when most queries will result in a conflict

2 Answers2