Automatically dropping PostgreSQL tables once per day

Question

I have a scenario where I have a central server and a node. Both server and node are capable of running PostgreSQL but the storage space on the node is limited. The node collects data at a high speed and writes the data to its local DB. The server needs to replicate the data from the node. I plan on accomplishing this with Slony-I or Bucardo. The node needs to be able to delete all records from its tables at a set interval in order to minimize disk space used. Should I use pgAgent with a job consisting of a script like

DELETE FROM tablex, tabley, tablez;

where the actual batch file to run the script would be something like

@echo off
C:\Progra~1\PostgreSQL\9.1\bin\psql -d database -h localhost -p 5432 -U postgres -f C:\deleteFrom.sql

?

I'm just looking for opinions if this is the best way to accomplish this task or if anyone knows of a more efficient way to pull data from a remote DB and clear that remote DB to save space on the remote node. Thanks for your time.

score 3 · Accepted Answer · answered Jun 05 '12 at 18:25

The most efficient command for you is the TRUNCATE command.

With TRUNCATE, you can chain up tables, like your example:

TRUNCATE tablex, tabley, tablez;

Here's the description from the postgres docs:

TRUNCATE quickly removes all rows from a set of tables. It has the same effect as an unqualified DELETE on each table, but since it does not actually scan the tables it is faster. Furthermore, it reclaims disk space immediately, rather than requiring a subsequent VACUUM operation. This is most useful on large tables.

You may also add CASCADE as a parameter:

CASCADE Automatically truncate all tables that have foreign-key references to any of the named tables, or to any tables added to the group due to CASCADE.

Thanks for your help. I will use truncate rather than delete. — babcoccl, Jun 06 '12 at 12:21

score 1 · Answer 2 · answered Jun 05 '12 at 20:15

1

The two best options, depending on your exact needs and workflow, would be truncate, as @Bohemian suggested, or to create a new table, rename, then drop.

We use something much like the latter create/rename/drop method in one of our major projects. This has an advantage where you need to be able to delete some data, but not all data, from a table very quickly. The basic workflow is:

Create a new table with a schema identical to the old one

CREATE new_table LIKE ...
In a transaction, rename the old and new tables simultaneously:

BEGIN; RENAME table TO old_table; RENAME new_table TO table; COMMIT;
[Optional] Now you can do stuff with the old table, while the new table is happily accepting new inserts. You can dump the data to your centralized server, run queries on it, or whatever.
Delete the old table

DROP old_table;

This is an especially useful strategy when you want to keep, say, 7 days of data around, and only discard the 8th day's data all at once. Doing a DELETE in this case can be very slow. By storing the data in partitions (one for each day), it is easy to drop an entire day's data at once.

answered Jun 05 '12 at 20:15

Jonathan Hall

75,165
16
143
189

What happens in postgres if you have views defined over the table(s)? Answer: These commands will explode. – Bohemian Jun 06 '12 at 00:28
@Bohemian: You can drop and recreate the views... or you can use a parent table, and only drop/recreate child tables. We use this strategy quite effectively. – Jonathan Hall Jun 06 '12 at 00:49
Yeah, I know you can, but then you have to *know* what views to recreate. How do you maintain that? What if someone adds/drops a view - you must edit your script too! You've just unnecessarily bolted an operations process to schema issue. It becomes an unnecessary maintenance hassle. Besides, your solution simply isn't a good one - it doesn't perform well and you can blow logs if your tables are massive. TRUNCATE is the right approach here. I recommend you try it yourself. Trust me... you'll be glad you did – Bohemian Jun 06 '12 at 06:34
@Bohemian: I adamantly disagree with the statement that my solution "simply isn't a good one" in all situations. I'd also be the last person to say it's always the best solution. There are times when TRUNCATE is entirely the wrong solution--like when you don't need to delete all records (as mentioned in the last paragraph of my answer). – Jonathan Hall Jun 06 '12 at 13:49
@Bohemian: If you are dealing with a highly volatile schema, then recreating views could indeed be a challenge. For this, I would suggest the parent/child table approach--if there were reasons TRUNCATE would not work. – Jonathan Hall Jun 06 '12 at 13:51

Automatically dropping PostgreSQL tables once per day

2 Answers2