5

Suppose I have a database with a table which contains 200k+ rows.
This table has a fixed tuple with id 1800. The rest of the tuples sequence starts at 300k+.
I have a need to clean this table, delete all records without delete the one register with id 1800. I came up with 3 types of query i could possibly run:

DELETE FROM table WHERE id > 1800
DELETE FROM table WHERE id <> 1800
DELETE FROM table WHERE id NOT IN (1800)

I have a feeling that the first one is quicker than the others, but I am not sure, as all of the other data have ids way greater than 1800.

Which one of them is quicker, and why? Also, if there is a quicker way to delete the records excluding the one that cannot be deleted, let me know.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
MurifoX
  • 14,991
  • 3
  • 36
  • 60
  • 1
    The fastest will probably be to copy the one record you want into a temp table, truncate the table, then insert the one record back – lc. Jan 16 '13 at 19:25
  • Unless the query optimizer is total crap (which is very unlikely), the three are just the same. – Damon Jan 16 '13 at 19:26
  • It depends. If id is a (primary) key (or an indexed field) the three queries will have comparable performance. Else : anything goes, depending on indices and cardinality of the id-domain. – wildplasser Jan 16 '13 at 19:43
  • You may be interested in the closely related answers [here](http://stackoverflow.com/a/8290958/939860) and [here](http://stackoverflow.com/a/12824238/939860). – Erwin Brandstetter Jan 17 '13 at 01:21
  • Are there any foreign keys involved? If so that could greatly affect performance as well. Anyway, the copy record out and truncate might work, if there are no foreign keys. If there are foreign keys then deleting all the other rows is the way to go, sadly as it's pretty slow. Also if there are FKs then make sure they have indexes on the referncing rows. – Scott Marlowe Jan 18 '13 at 08:03

6 Answers6

10

The quickest way in most databases would be:

  1. Select the record with id 1800 into a temporary table
  2. Drop the original table
  3. Copy the data from the temp table into the full table

Admittedly, this may not be possible due to triggers, constraints, and permissions. In many databases you can do something similar by modifying (2) to truncate the table instead of dropping it.

As for your original question, the overhead with actually deleting the rows and the data associated with them is going to dominate the query. How you do the comparison is irrelevant.

Sample code

create temp table saved as
    select * from t where id = 1800

truncate table t

insert into t
    select * from saved

I'm not sure about Postgres naming conventions for temporary tables, but this is the idea.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • 2
    While this is quick, it does so with DDL statements instead of DML statements, which can be disastrous if you have to concern yourself with inter-table dependencies. I would favor adding an index on `id` (if it isn't there already) and doing the `delete` if you wanted safer data consistency; because, to run the delete, you need to select the records first (so an index speeds up part of the operation). However, if raw speed is your only concern, this is probably the fastest way to go. – Edwin Buck Jan 16 '13 at 21:18
  • Truncate is almost always better than dropping and recreating a table. What if there are foreign keys etc? Much simpler to truncate, and just as fast as dropping and recreating the table. (if not faster) – Scott Marlowe Jan 18 '13 at 08:01
6

As long as those affect same records, those will have similar performance.

There is a slight chance the former will use an index seek rather than more efficient full table scan, but it's negligible.

Quassnoi
  • 413,100
  • 91
  • 616
  • 614
  • Thanks for the great answer. However i will accept @GordonLinoff answer, as it provides a different technique to delete the records. – MurifoX Jan 16 '13 at 19:30
4

If you can't move the ID to a new table you might want to try and delete in groups or batches. Sometimes having a transaction with a large chunk of records is not handled the fastest. This is the case for any database oracle and microsoft database products included.

BEGIN TRANSACTION;
DELETE FROM table WHERE id >= 0 and  id < 20000 and id != 1800;
COMMIT TRANSACTION;
BEGIN TRANSACTION;
DELETE FROM table WHERE id >= 20000 and  id < 40000 and id != 1800;
COMMIT TRANSACTION;
etc
etc
Kuberchaun
  • 29,160
  • 7
  • 51
  • 59
1

If you want to save only the last record and remove all the other records you can make use of below query which worked for me

delete from public.table_name
WHERE lastrun_ts < (
   select MAX(lastrun_ts)
   FROM public.table_name
   ORDER BY MAX(lastrun_ts) DESC
   );
sanyassh
  • 8,100
  • 13
  • 36
  • 70
0

I just deleted around 80k rows except for the one with ID 1 took 4.5 seconds.
I used:

DELETE FROM `table` WHERE id NOT IN ( 1 )

Hope this helps.

LuiguiB
  • 3
  • 4
  • This does not really answer the question. If you have a different question, you can ask it by clicking [Ask Question](https://stackoverflow.com/questions/ask). To get notified when this question gets new answers, you can [follow this question](https://meta.stackexchange.com/q/345661). Once you have enough [reputation](https://stackoverflow.com/help/whats-reputation), you can also [add a bounty](https://stackoverflow.com/help/privileges/set-bounties) to draw more attention to this question. - [From Review](/review/late-answers/33153105) – Jim Castro Nov 16 '22 at 15:09
0
Delete from 'table name' where id!=3;

It will delete all the data from table except the given id.