2

I have a table where i have millions of records. The total size of that table only is somewhere 6-7 GigaByte. This table is my application log table. This table is growing really fast, which makes sense. Now I want to move records from log table into backup table. Here is the scenario and here is my question.

Table Log_A
Insert into Log_b select * from Log_A;
Delete from Log_A;

I am using postgres database. the question is

When this query is performed Does all the records from Log_A gets load in physical memory ? NOTE: My both of the above query runs inside a stored procedure. If No, then how will it works ?

I hope this question applies for all database.

I hope if somebody could provide me some idea on this.

Robert Harvey
  • 178,213
  • 47
  • 333
  • 501
prakashpoudel
  • 565
  • 2
  • 9
  • 25

3 Answers3

1

In PostgreSQL, that's likely to execute a sequential scan, loading some records into shared_buffers, inserting them, writing the dirty buffers out, and carrying on.

All the records will pass through main memory, but they don't all have to be in memory at once. Because they all get read from disk using normal buffered reads (pread) it will affect the operating system disk cache, potentially pushing other data out of the cache.

Other databases may vary. Some could execute the whole SELECT before processing the INSERT (though I'd be surprised if any serious ones did). Some do use O_DIRECT reads or raw disk I/O to avoid the OS cache affects, so the buffer cache effects might be different. I'd be amazed if any database relied on loading the whole SELECT into memory, though.

When you want to see what PostgreSQL is doing and how, the EXPLAIN and EXPLAIN (BUFFERS, ANALYZE) commands are quite useful. See the manual.

You may find writable common table expressions interesting for this purpose; it lets you do all this in one statement. In this simple case there's probably little benefit, but it can be a big win in more complex data migrations.

BTW, make sure to run that pair of queries wrapped in BEGIN and COMMIT.

Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
  • Sounds good. Thanks for the explanation. Btw as I am running these queries inside a postgres function. Do i still need to take care of BEGIN and COMMIT. Shouldn't the whole function should be is considered as a single transaction and wrapped with Begin and Commit. – prakashpoudel Oct 18 '12 at 02:14
  • @PRP Well, you didn't *say* it was encapsulated in a function. Any statement run without a surrounding transaction is automatically wrapped in one by the server, so you don't need (and can't use) explicit transaction management within a function. See http://stackoverflow.com/questions/12778209/are-postgres-functions-transactional – Craig Ringer Oct 18 '12 at 02:35
  • You might want to use a writable CTE (see link added to post). – Craig Ringer Oct 18 '12 at 02:37
0

Probably not.

Each record is individually processed; this particular query doesn't need to have knowledge of any of the other records to successfully execute. So the only record that needs to be in memory at any given moment is the one currently being processed.

But it really depends on whether or not the database thinks it can do it faster by loading up the whole table. Check the execution plan of the query.

Robert Harvey
  • 178,213
  • 47
  • 333
  • 501
0

If your setup allows it, just rename the old table and create a new empty one. Much faster, obviously, as no copying is done at all.

ALTER TABLE log_a RENAME TO log_b;
CREATE TABLE log_a (LIKE log_b INCLUDING ALL);

The LIKE clause copies the structure of the (now renamed) old table. INCLUDING ALL includes defaults, constraints, indexes, ...

Foreign key constraints or views depending on the table or other less common dependencies (but not queries in plpgsql functions) might be a hurdle for this route. You would have to recreate those to have them point to the new table. But a logging table like you describe probably carries no such dependencies.

This acquires an exclusive lock on the table. I assume, typical write access will be INSERT only in your case? One way to deal with concurrent access would then be to create the new table in a different schema and alter the search_path for your application user. Then the applications starts to write to the new table without concurrency issues. Of course, you wouldn't schema-qualify the table name in your INSERT statements for this to take effect.

CREATE SCHEMA log20121018;
CREATE TABLE log20121018.log_a (LIKE log20121011.log_a INCLUDING ALL);
ALTER ROLE myrole SET search_path = app, log20121018, public;

Or alter the search_path setting at whatever level is effective for you: globally, per database, per role, per session, per function ...

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228