2

I have a table, event, with a column unique_time of type timestamptz. I need each of the values in unique_time to be unique.

Given a timestamptz input, input_time, I need to find the minimum timestamptz value that satisfies the following criteria:

  • the result must be >= input_time
  • the result must not already be in unique_time

I cannot merely add one microsecond to the greatest value in unique_time, because I need the minimum value that satisfies the above criteria.

Is there a concise way to compute this as part of an insert or update to the event table?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Laurence Gonsalves
  • 137,896
  • 35
  • 246
  • 299
  • So even if it's just one microsecond apart, it would count? How many values do you have in `unique_time` that would likely collide with the `input_time`? – Bergi May 16 '22 at 22:32
  • @Bergi As long as they aren't **exactly** the same time, it's fine. So two values that are one microsecond apart are acceptable. I'd expect that in most cases, the number of values that would collide would be very small, but there may be some corner cases that could cause there to be several dozen collisions periodically. It's infrequent enough that it would be ok if performance suffers a bit in those cases, but frequent enough that I'd still want it to work correctly in those cases. – Laurence Gonsalves May 16 '22 at 22:39
  • 2
    I can think of 3 approaches: a) use [`generate_series($input_time, $input_time+100 microsecond, 1 microsecond)`](https://www.postgresql.org/docs/current/functions-srf.html#FUNCTIONS-SRF-SERIES), select minimum `WHERE NOT EXISTS(…)`. Downside: it needs a stop value and might fail. Also no idea about performance (it might actually generate all rows). b) Use a CTE to create a real infinite sequence, in the correct order. I remember a SO post doing that (with integers) but can't find the link. c) use PL/SQL and a simple `WHILE` loop that increments the timestamp as long as the insertion fails. – Bergi May 16 '22 at 22:50
  • Please add your table definition to the query + some test data to play with. Given that, it ewould be rather trivial, I expect. – wildplasser May 16 '22 at 22:58

2 Answers2

2

I suggest a function with a loop:

CREATE OR REPLACE FUNCTION f_next_free(_input_time timestamptz, OUT _next_free timestamptz)
  LANGUAGE plpgsql STABLE STRICT AS
$func$
BEGIN
   LOOP
      SELECT INTO _next_free  _input_time
      WHERE  NOT EXISTS (SELECT FROM event WHERE unique_time = _input_time);
      
      EXIT WHEN FOUND;
      _input_time := _input_time + interval '1 us';
   END LOOP;
END
$func$;

Call:

SELECT f_next_free('2022-05-17 03:44:22.771741+02');

Be sure to have an index on event(unique_time). If the column is defined UNIQUE or PRIMARY KEY, that index is there implicitly.

Related:

Since Postgres timestamps have microsecond resolution, the next free timestamp is at least 1 microsecond (interval '1 us') away. See:

Could also be a recursive CTE, but the overhead is probably bigger.

Concurrency!

Is there a concise way to compute this as part of an INSERT or UPDATE to the event table?

The above is obviously subject to a race condition. Any number of concurrent transaction might find the same free spot. Postgres cannot lock rows that are not there, yet.

Since you want to INSERT (similar for UPDATE) I suggest INSERT .. ON CONFLICT DO NOTHING instead in a loop directly. Again, we need a UNIQUE or PRIMARY KEY on unique_time:

CREATE OR REPLACE FUNCTION f_next_free(INOUT _input_time timestamptz, _payload text)
  LANGUAGE plpgsql AS
$func$
BEGIN
   LOOP
      INSERT INTO event (unique_time, payload)
      VALUES (_input_time, _payload)
      ON CONFLICT (unique_time) DO NOTHING;
      
      EXIT WHEN FOUND;
      _input_time := _input_time + interval '1 us';
   END LOOP;
END
$func$;

Adapt your "payload" accordingly.

A successful INSERT locks the row. Even if concurrent transactions cannot see the inserted row yet, a UNIQUE index is absolute.
(You could make it work with advisory locks ...)

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
1

Ah, forgot about the approaches from my comment that would try to generate an (infinite) sequence of all microsecond timestamps following the $input_time. There's a much simpler query that can generate exactly the timestamp you need:

INSERT INTO event(unique_time, others)
SELECT MIN(candidates.time), $other_values
FROM (
  SELECT $input_time AS "time"
UNION ALL
  SELECT unique_time + 1 microsecond AS time
  FROM event
  WHERE unique_time >= $input_time
) AS candidates
WHERE NOT EXISTS (
  SELECT *
  FROM unique_time coll
  WHERE coll.unique_time = candidates.time
);

However, I'm not sure how well Postgres can optimise this, the MIN aggregate might load all the timestamps from event that are larger than $input_time - which might be fine if you always append events at the end, but still. A probably better alternative would be

INSERT INTO event(unique_time, others)
SELECT available.time, $other_values
FROM (
  SELECT *
  FROM (
    SELECT $input_time AS "time"
  UNION ALL
    SELECT unique_time + 1 microsecond AS time
    FROM event
    WHERE unique_time >= $input_time
  ) AS candidates
  WHERE NOT EXISTS (
    SELECT *
    FROM unique_time coll
    WHERE coll.unique_time = candidates.time
  )
  ORDER BY candidates.unique_time ASC
) AS available
ORDER BY available.time ASC
LIMIT 1;

This might (I don't know) still have to evaluate the complex subquery every time you insert something though, which would be rather inefficient if most of the input don't cause a collision. Also I have no idea how well this works under concurrent loads (i.e. multiple transactions running the query at the same time) and whether it has possible race conditions.

Alternatively, just use a WHILE loop (in the client or PL/SQL) that attempts to insert the value until it succeeds and increments the timestamp on every iteration - see @Erwin Brandstetter's answer for that.

Bergi
  • 630,263
  • 148
  • 957
  • 1,375