0

My SCHEMA is the following and I have ~ 4m existing posts in the DB that I need to update. I am adding an integer which points to a text location.

CREATE TABLE app_post (
id integer NOT NULL,
text_location integer,
title character varying(140)
);

I want to update existing records with a long (1000-5000) list of tuples that represent (id, text_location):

[(1, 123), (2,3), (9, 10)....]

What is the most efficient way to do this?

James
  • 91
  • 1
  • 11
  • Do you have a table storing those tuples, or you have them in a separate file? – Eggplant Sep 12 '13 at 14:39
  • They are being generated on the fly using python, but I could write them to a file and process them from there. – James Sep 12 '13 at 14:48
  • I think the *most efficent way* to do that would be having those values stored in a separate table (even a temporary one will be alright), and if you create some proper indexes on the `id` columns (if you define them like `id integer PRIMARY KEY` the index is implicit anyways), executing an `UPDATE FROM` query will be very fast. I'm making an answer to show a possible solution. – Eggplant Sep 12 '13 at 14:53

1 Answers1

1

If you are generating the values on the fly using phyton, you could:

  1. Create a buffer containing a single INSERT statement
  2. Start a transaction
  3. Create a temporary table and perform the INSERT statement in your buffer
  4. Perform an UPDATE ... FROM
  5. Commit the transaction, discarding the temporary table.

The UPDATE statement will look like this (assuming there is a table new_values containing those new values you need to update):

UPDATE app_post AS a SET text_location = n.text_location 
    FROM new_values AS n WHERE a.id = n.id

Don't forget to define the columns id as PRIMARY KEY or create an index on them.


EDIT : Since you are experiencing very slow performance, another workaround could be to recreate the whole table. The following idea assumes you don't have any FOREIGN KEY constraint applied to app_post table, as you have shown in your initial post.

-- Begin the Transaction
BEGIN;

-- Create a temporary table to hold the new values
CREATE TEMPORARY TABLE temp_update_values (
    id integer PRIMARY KEY,
    text_location integer
) ON COMMIT DROP;

-- Populate it
INSERT INTO temp_update_values (id, text_location) VALUES (1, 123), (2, 456) /* ... #5000 total */ ;

-- Create a temporary table merging the existing "app_post" and "temp_update_values"
CREATE TEMPORARY TABLE temp_new_app_post ON COMMIT DROP AS 
    SELECT a.id, COALESCE(n.text_location, a.text_location) AS text_location, a.title 
    FROM app_post AS a LEFT JOIN temp_update_values AS n ON a.id = n.id;

-- Empty the existing "app_post"
TRUNCATE TABLE app_post;

-- Repopulate "app_post" table 
INSERT INTO app_post (id, text_location, title) 
    SELECT id, text_location, title FROM temp_new_app_post;

-- Commit the Transaction
COMMIT;

If there are any FOREIGN KEY constraint, you should take care of them, dropping them before TRUNCATING the app_post table, and re-creating them after it's been repopulated.

Eggplant
  • 1,903
  • 1
  • 14
  • 24
  • Thanks! do I need to use Begin to start the transaction? – James Sep 12 '13 at 18:36
  • Maybe this is just slow whatever way I do it, but to run the update from new_values to app_post is taking 188seconds (and I have 4M posts to run through, so looking at almost a week). I ran the code above, and here is the explain analyze: https://code.stypi.com/znlqgrql – James Sep 12 '13 at 20:41
  • About the transaction, yes, you must use `BEGIN` before the temporary table creation, and `COMMIT` after the update. About the speed, it's definitely too much time. You don't have any triggers on the `app_post` table? The *fastest* way would then be to create a new table, this could be plain easy, or tricky depending on your actual `FOREIGN KEYS`. I'll update the answer. – Eggplant Sep 13 '13 at 08:10
  • I actually do have one Foreign key: – James Sep 13 '13 at 14:10
  • 1
    Are there any `FOREIGN KEYS` pointing to the `app_post` table? In this case you have to take care of them, just choose one way among many, here is a discussion about this subject: http://stackoverflow.com/questions/139884/how-do-i-disable-referential-integrity-in-postgres-8-2 – Eggplant Sep 13 '13 at 14:20