159

I'm trying to port some old MySQL queries to PostgreSQL, but I'm having trouble with this one:

DELETE FROM logtable ORDER BY timestamp LIMIT 10;

PostgreSQL doesn't allow ordering or limits in its delete syntax, and the table doesn't have a primary key so I can't use a subquery. Additionally, I want to preserve the behavior where the query deletes exactly the given number or records -- for example, if the table contains 30 rows but they all have the same timestamp, I still want to delete 10, although it doesn't matter which 10.

So; how do I delete a fixed number of rows with sorting in PostgreSQL?

Edit: No primary key means there's no log_id column or similar. Ah, the joys of legacy systems!

Whatsit
  • 10,227
  • 11
  • 42
  • 41

6 Answers6

230

You could try using the ctid:

DELETE FROM logtable
WHERE ctid IN (
    SELECT ctid
    FROM logtable
    ORDER BY timestamp
    LIMIT 10
)

The ctid is:

The physical location of the row version within its table. Note that although the ctid can be used to locate the row version very quickly, a row's ctid will change if it is updated or moved by VACUUM FULL. Therefore ctid is useless as a long-term row identifier.

There's also oid but that only exists if you specifically ask for it when you create the table.

mu is too short
  • 426,620
  • 70
  • 833
  • 800
  • 2
    This works, but how reliable is it? Are there any 'gotchas' I need to look out for? Is it possible for `VACUUM FULL` or autovacuum to cause problems if they change the `ctid` values in the table while the query is running? – Whatsit Mar 02 '11 at 20:35
  • 2
    Incremental VACUUMs won't change ctids, I don't think. Since that just compacts within each page, and the ctid is just the line number not a page offset. A VACUUM FULL or a CLUSTER operation *would* change the ctid, but those operations take an access exclusive lock on the table first. – araqnid Mar 02 '11 at 21:49
  • @Whatsit: My impression of the `ctid` documentation is that `ctid` is stable enough to make this DELETE work okay but not stable enough to, for example, put in another table as a ghetto-FK. Presumably you don't UPDATE the `logtable` so you don't have to worry about that changing `ctid`s and `VACUUM FULL` does lock the table (http://www.postgresql.org/docs/current/static/routine-vacuuming.html) so you don't have to worry about the other way that `ctid`s can change. @araqnid's PostgreSQL-Fu is pretty strong and the docs agree with him to boot. – mu is too short Mar 03 '11 at 05:54
  • Thanks to both of you for the clarification. I did look into the docs but I wasn't certain I was interpreting them correctly. I'd never encountered ctids before this. – Whatsit Mar 03 '11 at 20:54
  • This is actually a pretty bad solution since Postgres is not able to use TID scan in joins (IN is a particular case of it). If you look at the plan, it should be quite terrible. So "very quickly" applies only when you specify CTID explicitly. The said is as of version 10. – greatvovan Dec 18 '18 at 03:25
  • This is a very dangerous way to do it, you are asking for troubles, as example from postgres 10 behind that table logtable you can have multiple partitions, and basically you can have multiple rows with ctid (0,1) but on different partitions. Author of this reply should delete it. – Gaetano Mendola Mar 08 '19 at 21:37
  • `IN` in this case is problematic, but like suggested in other comments/answers if you switch to `=any(array(` the execution plan is indeed much better (verified on 11.4). – Pyrocks Nov 26 '19 at 09:35
  • @Whatsit - I ran into a limit in the number of values allowed inside the parens for `IN(...)`. I forget now if it was 1024 or 256 or what exactly, but using `ANY(ARRAY(...))` fixed it. – user9645 Dec 20 '19 at 14:46
76

Postgres docs recommend to use array instead of IN and subquery. This should work much faster

DELETE FROM logtable 
WHERE id = any (array(SELECT id FROM logtable ORDER BY timestamp LIMIT 10));

This and some other tricks can be found here

pd40
  • 3,187
  • 3
  • 20
  • 29
criticus
  • 1,571
  • 8
  • 15
  • @Konrad Garus Here you go [link](http://www.postgres.cz/index.php/PostgreSQL_SQL_Tricks), 'Fast first n rows removing' – criticus Mar 02 '11 at 17:24
  • 2
    @BlakeRegalia No, because there's no primary key in the table specified. This will delete all the rows with an "ID" found in the first 10. If all the rows have the same ID all the rows will be deleted. – Philip Whitehouse Jan 14 '15 at 12:25
  • 14
    If `any (array( ... ));` is faster than `in ( ... )` that sounds like a bug in the query optimiser - it should be able to spot that transformation and do the same thing with the data itself. – rjmunro Oct 09 '15 at 11:10
  • 1
    I found this method to be considerable slower than using `IN` on an `UPDATE` (which might be the difference). – jmervine Dec 18 '16 at 04:00
  • 4
    Measurement on 12 GB table: first query 450..1000 ms, second one 5..7 seconds: Fast one: delete from cs_logging where id = any (array( select id from cs_logging where date_created < now() - interval '1 days' * 30 and partition_key like '%I' order by id limit 500 )) Slow one: delete from cs_logging where id in ( select id from cs_logging where date_created < now() - interval '1 days' * 30 and partition_key like '%I' order by id limit 500 ). Using ctid was a lot slower (minutes). – Guido Leenders May 16 '18 at 17:56
  • If you look at the postgres source, it (sometimes, depending on other factors) assumes the paramter to ANY to be of size 10 in the optimizer. That is why it might be faster for some, and slower for some - it very much depends on which plan is faster on your db. – griffin Apr 21 '23 at 20:32
19
delete from logtable where log_id in (
    select log_id from logtable order by timestamp limit 10);
Konrad Garus
  • 53,145
  • 43
  • 157
  • 230
4

If you don't have a primary key you can use the array Where IN syntax with a composite key.

delete from table1 where (schema,id,lac,cid) in (select schema,id,lac,cid from table1 where lac = 0 limit 1000);

This worked for me.

2

Assuming you want to delete ANY 10 records (without the ordering) you could do this:

DELETE FROM logtable as t1 WHERE t1.ctid < (select t2.ctid from logtable as t2  where (Select count(*) from logtable t3  where t3.ctid < t2.ctid ) = 10 LIMIT 1);

For my use case, deleting 10M records, this turned out to be faster.

1

You could write a procedure which loops over the delete for individual lines, the procedure could take a parameter to specify the number of items you want to delete. But that's a bit overkill compared to MySQL.

Bernhard
  • 8,583
  • 4
  • 41
  • 42