7

Among the many things we do with Postgres at work, we use it as a cache for certain kinds of remote requests. Our schema is:

CREATE TABLE IF NOT EXISTS cache (
    key VARCHAR(256) PRIMARY KEY,
    value TEXT NOT NULL,
    ttl TIMESTAMP DEFAULT NULL
);

CREATE INDEX IF NOT EXISTS idx_cache_ttl ON cache(ttl);

This table does not have triggers or foreign keys. Updates are typically:

INSERT INTO cache (key, value, ttl)
VALUES ('Ethan is testing8393645', '"hi6286166"', sec2ttl(300))
ON CONFLICT (key) DO UPDATE
SET value = '"hi6286166"', ttl = sec2ttl(300);

(Where sec2ttl is defined as:)

CREATE OR REPLACE FUNCTION sec2ttl(seconds FLOAT)
RETURNS TIMESTAMP AS $$
BEGIN
    IF seconds IS NULL THEN
        RETURN NULL;
    END IF;
    RETURN now() + (seconds || ' SECOND')::INTERVAL;
END;
$$ LANGUAGE plpgsql;

Querying the cache is done in a transaction like this:

BEGIN;
DELETE FROM cache WHERE ttl IS NOT NULL AND now() > ttl;
SELECT value FROM cache WHERE key = 'Ethan is testing6460437';
COMMIT;

There are a few things not to like about this design -- the DELETE happening in cache "reads", the index on cache.ttl is not ascending which makes it kind of useless, (edit: ASC is the default, thanks wargre!) plus the fact that we're using Postgres as a cache at all. But all of that would have been acceptable except that we've started getting deadlocks in production, which tend to look like this:

ERROR: deadlock detected
DETAIL:  Process 12750 waits for ShareLock on transaction 632693475; blocked by process 10080.
Process 10080 waits for ShareLock on transaction 632693479; blocked by process 12750.
HINT:  See server log for query details.
CONTEXT:  while deleting tuple (426,1) in relation "cache"
 [SQL: 'DELETE FROM cache WHERE ttl IS NOT NULL AND now() > ttl;']

Investigating the logs more thoroughly indicates that both transactions were performing this DELETE operation.

As far as I can tell:

  • My transactions are in READ COMMITTED isolation mode.
  • ShareLocks are grabbed by one transaction to indicate that it wants to mutate rows that another transaction has mutated (i.e. locked).
  • Based on the output of an EXPLAIN query, the ShareLocks should be grabbed by both DELETE transactions in physical order.
  • The deadlock indicates that both queries locked rows in a different order.

If all that is correct, then somehow some simultaneous transaction has changed the physical order of rows. I see that an UPDATE can move a row to an earlier or later physical position, but in my application, the UPDATEs always remove rows from consideration by the DELETEs (because they're always extending a row's TTL). If the rows were previously in physical order, and you remove one, then you're still left with physical order. Similarly for DELETE. We're not doing any VACUUM or any other operation which you might expect to reorder rows.

Based on Avoiding PostgreSQL deadlocks when performing bulk update and delete operations, I tried to change the DELETE queries to:

DELETE FROM cache c
USING (
   SELECT key
   FROM cache
   WHERE ttl IS NOT NULL AND now() > ttl
   ORDER BY ttl ASC
   FOR UPDATE
) del
WHERE del.key = c.key;

However, I'm still able to get deadlocks locally. So generally, how can two DELETE queries deadlock? Is it because they're locking in an undefined order, and if so, how do I enforce a specific order?

Ethan
  • 111
  • 2
  • 6
  • What about managing your cache in a connection with autocommit and without transaction ? (btw, the index is correct, it is used by your dlete) – wargre Aug 10 '17 at 18:56
  • I think even if I don't have a transaction, Postgres creates a transaction around every single query, right? And the DELETEs seem to deadlock all by themselves.. (BTW, thanks! I see ASC is the default for indices.) – Ethan Aug 10 '17 at 21:29
  • 2
    The point of the `SELECT ... FOR UPDATE` is to enforce a globally consistent order for all lock acquisitions. If two `ttl` values coincide, your row ordering is undefined, and if a `ttl` value is updated, then the ordering, while well-defined, may differ between concurrent transactions. Try `ORDER BY key` instead. – Nick Barnes Aug 10 '17 at 22:31
  • Thanks, adding `ORDER BY key` makes the deadlocks go away. But in my application, `ttl` values are always pushed 300 seconds after `now()`, which means they shouldn't be affected by the DELETE any more. So while it's true that the ordering may differ between transactions, one order will always be a subset of the other. Similarly, I guess if two records have the same `ttl`, the ordering is undefined, but whatever the behavior is, should be the same in both transactions, right? (I assumed it was physical order.) – Ethan Aug 11 '17 at 17:31
  • OK, I did more investigation and it seems like the two transactions are deadlocking on two equal `ttl`s, so I guess your "row ordering is undefined" theory is correct. If you make it an answer, I'll mark it as accepted. – Ethan Aug 12 '17 at 14:33

1 Answers1

1

You should instead ignore expired cache entries, so you will not depend on a frequent delete operation for cache expiration:

SELECT value
FROM cache
WHERE
  key = 'Ethan is testing6460437'
  and (ttl is null or ttl<now());

And have another job that periodically chooses keys to delete skipping already locked keys, which has to either force a well defined order of deleted row, or, better, skip already locked for update rows:

with delete_keys as (
  select key from cache
  where
    ttl is not null
    and now()>ttl
  for update skip locked
)
delete from cache
where key in (select key from delete_keys);

If you can't schedule this periodically you should run this cleanup like randomly once every 1000 runs of your select query, like this:

create or replace function delete_expired_cache()
returns void
language sql
as $$
  with delete_keys as (
    select key from cache
    where
      ttl is not null
      and now()>ttl
    for update skip locked
  )
  delete from cache
  where key in (select key from delete_keys);
$$;

SELECT value
FROM cache
WHERE
  key = 'Ethan is testing6460437'
  and (ttl is null or ttl<now());
select delete_expired_cache() where random()<0.001;

You should avoid writes, as they are expensive. Don't delete cache so often.


Also you should use timestamp with time zone type (or timestamptz for short) instead of simple timestamp - especially if you don't know why - a timestamp is not the thing most think it is - blame SQL standard.

Tometzky
  • 22,573
  • 5
  • 59
  • 73
  • Thanks -- but I already know the problems with the design. My question isn't how to fix the design, but why does this design cause a deadlock? (Also, thanks for the tip regarding timestamptz, but we `SET TIMEZONE TO UTC` on our database; our application doesn't really expose any timestamps except UTC ones.) – Ethan Aug 10 '17 at 21:24