I am trying to update every row in order_item. Status is a newly created column, and must have the latest value from the order_update table. One item can have several updates.
I am using PostgreSQL 9.1
I have this update sql.
The table order_item
has 800K records.
The table order_update
has 5Mil records.
update order_item
set status = (
select production_stage
from order_update
where id = (
select max(id)
from order_update
where order_item_id = order_item.id
)
);
How can I make this sql perform the best way. I know the update will take some time, just want to have it as fast as possible.
I found that when doing just this sql on 5Mil records.
select max(id) from order_update where order_item_id = 100;
Explain:
Result (cost=784.10..784.11 rows=1 width=0)" InitPlan 1 (returns $0)
-> Limit (cost=0.00..784.10 rows=1 width=8)
-> Index Scan Backward using order_update_pkey on order_update (cost=0.00..104694554.13 rows=133522 width=8)
Index Cond: (id IS NOT NULL)
Filter: (order_item_id = 100)
it takes about 6 seconds.
When I do the same sql in 1Mil records:
Explain:
Aggregate (cost=13.43..13.44 rows=1 width=8) -> Index Scan using
order_update_order_item_id_idx on order_update (cost=0.00..13.40
rows=11 width=8)
Index Cond: (order_item_id = 100)
it takes around 11 ms.
11 ms vs. 6 sec. Why the HUGE diff?
To narrow it down a little I try this :
select id from order_update where order_item_id = 100 order by id asc
limit 1
Total query runtime: 41 ms.
and then this :
select id from order_update where order_item_id = 100 order by id desc
limit 1
Total query runtime: 5310 ms.
so a huge diff in asc and desc.
Solution : Create index :
CREATE INDEX order_update_mult_idx ON order_update (order_item_id, id DESC);
Update :
UPDATE order_item i
SET test_print_provider_id = u.test_print_provider_id
FROM (
SELECT DISTINCT ON (1)
test_print_provider_id
FROM orders
ORDER BY 1, id DESC
) u
WHERE i.order_id = u.id
AND i.test_print_provider_id IS DISTINCT FROM u.test_print_provider_id;