Instead of committing all changes at once (or 55 million times as suggested in the other answers), I would rather try splitting the update rows in small batches, e.g. 10k records as you suggested. In PL/pgSQL it is possible to iterate over a collection with a given step using the keyword BY
. So you could do your batch update in an anonymous code block
like this:
PostgreSQL 11+
DO $$
DECLARE
page int := 10000;
min_id bigint; max_id bigint;
BEGIN
SELECT max(id),min(id) INTO max_id,min_id FROM account;
FOR j IN min_id..max_id BY page LOOP
UPDATE account SET name = 'your magic goes here'
WHERE id >= j AND id < j+page;
COMMIT;
END LOOP;
END; $$;
- You might want to adjust the
WHERE
clause to avoid unnecessary overlaps.
Testing
Data Sample with 1051 rows with sequential ids:
CREATE TABLE account (id int, name text);
INSERT INTO account VALUES(generate_series(0,1050),'untouched record..');
Executing anonymous code block ...
DO $$
DECLARE
page int := 100;
min_id bigint; max_id bigint;
BEGIN
SELECT max(id),min(id) INTO max_id,min_id FROM account;
FOR j IN min_id..max_id BY page LOOP
UPDATE account SET name = now() ||' -> UPDATED ' || j || ' to ' || j+page
WHERE id >= j AND id < j+page;
RAISE INFO 'committing data from % to % at %', j,j+page,now();
COMMIT;
END LOOP;
END; $$;
INFO: committing data from 0 to 100 at 2021-04-14 17:35:42.059025+02
INFO: committing data from 100 to 200 at 2021-04-14 17:35:42.070274+02
INFO: committing data from 200 to 300 at 2021-04-14 17:35:42.07806+02
INFO: committing data from 300 to 400 at 2021-04-14 17:35:42.087201+02
INFO: committing data from 400 to 500 at 2021-04-14 17:35:42.096548+02
INFO: committing data from 500 to 600 at 2021-04-14 17:35:42.105876+02
INFO: committing data from 600 to 700 at 2021-04-14 17:35:42.114514+02
INFO: committing data from 700 to 800 at 2021-04-14 17:35:42.121946+02
INFO: committing data from 800 to 900 at 2021-04-14 17:35:42.12897+02
INFO: committing data from 900 to 1000 at 2021-04-14 17:35:42.134388+02
INFO: committing data from 1000 to 1100 at 2021-04-14 17:35:42.13951+02
.. you can update your rows in batches. To make my point clearer, the following query counts the number of records grouped by update time:
SELECT DISTINCT ON (name) name, count(id)
FROM account
GROUP BY name ORDER BY name;
name | count
------------------------------------------------------+-------
2021-04-14 17:35:42.059025+02 -> UPDATED 0 to 100 | 100
2021-04-14 17:35:42.070274+02 -> UPDATED 100 to 200 | 100
2021-04-14 17:35:42.07806+02 -> UPDATED 200 to 300 | 100
2021-04-14 17:35:42.087201+02 -> UPDATED 300 to 400 | 100
2021-04-14 17:35:42.096548+02 -> UPDATED 400 to 500 | 100
2021-04-14 17:35:42.105876+02 -> UPDATED 500 to 600 | 100
2021-04-14 17:35:42.114514+02 -> UPDATED 600 to 700 | 100
2021-04-14 17:35:42.121946+02 -> UPDATED 700 to 800 | 100
2021-04-14 17:35:42.12897+02 -> UPDATED 800 to 900 | 100
2021-04-14 17:35:42.134388+02 -> UPDATED 900 to 1000 | 100
2021-04-14 17:35:42.13951+02 -> UPDATED 1000 to 1100 | 51
Demo: db<>fiddle