I am attempting to make an update query run faster in Postgres. The query is relatively simple and I have busted it up to spread it across all of the CPU's on my database server.
UPDATE p797.line a SET p = 5 FROM p797.pt b WHERE a.source = b.node AND a.id >= 0 and a.id < 40000000
where "0" and "40000000" are replaced with different values as you move through all the rows in the table. The "line" table has 1.3 billion records and the pt table has 500 million.
Right now this process runs in about 16 hours. I have other update queries that I need to perform and if each takes 16 hours, the results will take weeks to acquire.
I found something interesting that I would like to try, but am unsure if it can be implemented in my case as I am running queries over a Network.
Slow simple update query on PostgreSQL database with 3 million rows
Here, Le Droid makes reference to COPY, a method which I believe I cannot employ as I am running over a network. They also use BUFFER, of which I do not understand how to employ. Also, both my tables reside in the same database, not as the combination database table and CSV. How can I massage my query to get the gains that @Le Droid mentions? Is there another methodology that I can employ to see time gains? I did see Le Droid mention that HOT only sees marginal gains with lots of cost. Other methods?
It might also be noteworthy that I am creating the queries in Python and sending them to the Postgres database using psycopg2.
EDIT:
Here is an EXPLAIN on the above statement without the ID limitation:
"Update on line a (cost=10665536.12..342338721.96 rows=1381265438 width=116)"
" -> Hash Join (cost=10665536.12..342338721.96 rows=1381265438 width=116)"
" Hash Cond: (a.source= b.node)"
" -> Seq Scan on line a (cost=0.00..52953645.38 rows=1381265438 width=102)"
" -> Hash (cost=8347277.72..8347277.72 rows=126271072 width=22)"
" -> Seq Scan on pt b (cost=0.00..8347277.72 rows=126271072 width=22)"