1

I wonder, what is the fastest way to accomplish this kind of task in PostgreSQL. I am interested in the fastest solutions ever possible.

I found myself such kind of solution for MySQL, it performs much faster than just truncation of tables one by one. But anyway, I am interested in the fastest solutions for MySQL too. See my result here, of course it it for MySQL only: https://github.com/bmabey/database_cleaner/issues/126

I have following assumptions:

  • I have 30-100 tables. Let them be 30.
  • Half of the tables are empty.
  • Each non-empty table has, say, no more than 100 rows. By this I mean, tables are NOT large.
  • I need an optional possibility to exclude 2 or 5 or N tables from this procedure.

  • I cannot! use transactions.

I need the fastest cleaning strategy for such case working on PostgreSQL both 8 and 9.

I see the following approaches:

  1. Truncate each table. It is too slow, I think, especially for empty tables.

  2. Check each table for emptiness by more faster method, and then if it is empty, reset its unique identifier column (analog of AUTO_INCREMENT in MySQL) to initial state (1), i.e to restore its last_value from sequence back to 1, otherwise run truncate on it.

I use Ruby code to iterate through all tables, calling code below on each of them, I tried to setup SQL code running against each table like:

DO $$DECLARE r record;
BEGIN
  somehow_captured = SELECT last_value from #{table}_id_seq
  IF (somehow_captured == 1) THEN
    == restore initial unique identifier column value here ==
  END

  IF (somehow_captured > 1) THEN
    TRUNCATE TABLE #{table};
  END IF;
END$$;

Manipulating this code in various aspects, I couldn't make it work, because of I am unfamiliar with PostgreSQL functions and blocks (and variables).

Also my guess was that EXISTS(SELECT something FROM TABLE) could somehow be used to work good as one of the "check procedure" units, cleaning procedure should consist of, but haven't accomplished it too.

I would appreciate any hints on how this procedure could be accomplished in PostgreSQL native way.

UPDATE:

I need all this to run unit and integration tests for Ruby or Ruby on Rails projects. Each test should have a clean DB before it runs, or to do a cleanup after itself (so called teardown). Transactions are very good, but they become unusable when running tests against particular webdrivers, in my case the switch to truncation strategy is needed. Once I updated that with reference to RoR, please do not post here the answers about "Obviously, you need DatabaseCleaner for PG" and so on and so on.

UPDATE 2:

The strategy described here recently was merged into DatabaseCleaner, https://github.com/bmabey/database_cleaner as :pre_count option (see README there).

Stanislav Pankevich
  • 11,044
  • 8
  • 69
  • 129
  • "*I am interested in the fastest solutions ever possible*". Disconnect your storage media? Seriously though, what exactly are you trying to accomplish? If it's atomicity, perform your operations within a transaction. – eggyal Jul 03 '12 at 12:48
  • Actually, it is for running tests. This procedure mainly will serve as a before hook or teardown hook for each Capybara Ruby on Rails integration is being run. – Stanislav Pankevich Jul 03 '12 at 12:53
  • Why is truncating each table slow? Empty tables won't take any time to be truncated in Postgres (much faster than counting the rows before) –  Jul 03 '12 at 13:04
  • @a_horse_with_no_name, you assertions are very short and firm! I hoped, that my question deserves more verbose output. Could you, please, somehow post here the link to official docs or some benchmark tests. How can I rely on what you're saying to me, posting it in short comments. Sorry, if I hardly misunderstand some PG basics, but especially then it would be nice to have a good verbose Answer written by you. – Stanislav Pankevich Jul 03 '12 at 13:20
  • It's documented in the manual. See my answer –  Jul 03 '12 at 13:28
  • You already ruled out the transaction approach that was found to be very fast in [this related question](http://stackoverflow.com/q/9407442/939860). It's also a good read because it mentions additional options like `fsync`, `full_page_writes` and *RAM disk*. – Erwin Brandstetter Jul 16 '12 at 19:17

5 Answers5

3

PostgreSQL can truncate many tables in a single TRUNCATE TABLE statement. Don't bother iterating and just do

TRUNCATE TABLE table1,table2,table3,...,table30;
Daniel Vérité
  • 58,074
  • 15
  • 129
  • 156
  • MySQL experience shows me, that two right selects and one core mysql function do it much faster than analogous truncation run. I knew that most of the answers will be like yours, when I was writing this. – Stanislav Pankevich Jul 03 '12 at 12:57
  • @Stanislaw: how exactly does a "*right select*" delete rows from a table? –  Jul 03 '12 at 13:03
  • Actually I use this multiple truncation expression like yours as starting point to start my tune up. – Stanislav Pankevich Jul 03 '12 at 13:04
  • @Stanislaw: Don't bother checking if the tables contain rows in PostgreSQL. That won't speed up anything. –  Jul 03 '12 at 13:05
  • @a_horse_with_no_name, sorry, I forgot to add, that they serve as condition by which truncate is being run. If condition is not met, truncate is just not run, in that case I manually reset AUTO_INCREMENT. I'll show this a bit later. As SO uses to behave, some brilliant solutions may suddenly appear. I asked this question, hoping they will do. – Stanislav Pankevich Jul 03 '12 at 13:08
  • @Stanislaw: why would you reset AUTO_INCREMENT (or the PostgreSQL equivalent) when you do ***not*** truncate the table? That will definitely create problems with the next generated ID. The more you explain the more confusing things get (at least for me) –  Jul 03 '12 at 13:21
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/13361/discussion-between-stanislaw-and-a-horse-with-no-name) – Stanislav Pankevich Jul 03 '12 at 13:37
  • 3
    @Stanislaw: whatever caveat MySQL's TRUNCATE may have (and it has some according to the documentation, and depending on the version), it has no reason to be the same with PostgreSQL. All DBMs have their own specific idiosyncrasies. Your assertion #1 that truncate all would be too slow is unsubstantiated. – Daniel Vérité Jul 03 '12 at 13:42
  • I am not insisting on it. Again, I am asking you, did you really try to make it faster, or you are pushing the very probable and thus certain "default" here. Somehow, these all "do the only and the only truncate {tables}" sound unconvincing to me. Anyway, If I am completely wrong, we could just wait for some more time. Or maybe some PG core developers are there among comments authors - than say it loudly, please! – Stanislav Pankevich Jul 03 '12 at 13:52
  • Mysql does not allow multiple tables in this way – Barry Jul 19 '16 at 13:17
2

See also:

Postgresql Truncation speed

for discussion of why truncation can be slower on Pg, and why DELETE isn't the same thing.

Community
  • 1
  • 1
Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
1

As requested in the comment
(although I don't feel this is the right answer - but it's too long for a comment)

There is no (noticable) performance difference between truncating an empty table or truncating a large table.

As stated in the manual (http://www.postgresql.org/docs/current/static/sql-truncate.html) "it does not actually scan the tables"

So if you first check if there are any rows in the table, you will scan the table. Something that won't happen if you simply issue the truncate without bothering whether

1

[I don't know RoR]

A nice way to start with a clean slate is to create and use a temp SCHEMA:

DROP SCHEMA fuzz CASCADE;
CREATE SCHEMA fuzz;
SET search_path='fuzz';

(this is what I use to test sql snippets). But this would create an empty schema, and you cannot copy schemas, IFAIK.

The other way would be to create your database (including the empty tables) and use that as a template for constructing the test-rig:

DROP DATABASE testdb;
CREATE DATABASE testdb TEMPLATE my_spcial_template;

The problem with that is, that you cannot drop the database if there are still connections to it (such as the drop-process itself) So, your front-end should first disconnect, than temporarily connect to some other DB (such as my_spcial_template) , than dropdb+createdb, than connect testdb. I don't know about performance, but at least it is a robust scheme.

wildplasser
  • 43,142
  • 8
  • 66
  • 109
  • Thanks for suggestion. I can't use it, because I need to have an option to skip and thus preserve any of given tables when running tests. – Stanislav Pankevich Jul 13 '12 at 08:03
  • Missed your answer and just posted a similar one. Deleted as dupe. Template database would be my first idea, too. – Erwin Brandstetter Jul 16 '12 at 19:15
  • @Stanislaw: You could rotate two (or more) databases for your tests. When you need to preserve a table, you could copy contents over with [dblink](http://www.postgresql.org/docs/current/interactive/contrib-dblink-function.html) which is pretty fast, too. This would also remove the necessity to connect to a different database before you can `DROP` / `CREATE`. – Erwin Brandstetter Jul 16 '12 at 19:20
  • The problem is: these app-type youngsters are so occupied with speed & performance ;-) – wildplasser Jul 16 '12 at 19:20
  • @wildplasser: Call me a youngster, but I don't mind a bit of performance myself. :) That is - as long as the solution is solid. – Erwin Brandstetter Jul 16 '12 at 19:23
  • The problem is: I am not willing to benchmark it, but my gut-feeling is that DROP+CREATE (using customised template) will cost one (maybe a few) second(s). An it solves all the problems with prefilled tables, sequences, access rights, etc. – wildplasser Jul 16 '12 at 19:30
1

If someone is interested with the current strategy, I use for this, see this Ruby-based repo https://github.com/stanislaw/truncate-vs-count for both MySQL and PostgreSQL.

My results:

MySQL: the fastest strategy for cleaning databases is truncation with following modifications:

if table is not empty
  truncate. 
else 
  if AUTO_INCREMENT is not 0
    truncate.
  end
end
  • For MySQL just truncation is much faster than just deletion. The only case where DELETE wins over TRUNCATE is doing it on empty table.
  • For MySQL truncation with empty checks is much faster than just multiple truncation.
  • For MySQL deletion with empty checks is much faster than just DELETE on each tables.

PostgreSQL: The fastest strategy for cleaning databases is deletion with the same empty-checks as for MySQL, but with relying on currval instead:

if table is not empty
  delete table
else 
  if currval is not 0
    delete table
  end
end
  • For PostgreSQL just deletion is much faster than just TRUNCATION(even multiple).
  • For PostgreSQL multiple TRUNCATE doing empty checks before is slightly faster than just multiple TRUNCATE
  • For PostgreSQL deletion with empty checks is slightly faster than just PostgreSQL deletion.

This is from where it began: https://github.com/bmabey/database_cleaner/issues/126

This is the result code and long discussion: https://github.com/bmabey/database_cleaner/pull/127

This is discussion on pgsql-performance mailing list: http://archives.postgresql.org/pgsql-performance/2012-07/msg00047.php

We began collecting users feedback proving my idea with first checking empty tables is right.

Stanislav Pankevich
  • 11,044
  • 8
  • 69
  • 129