@kordirko made clear that your DO
statement is needlessly expensive. But there's more.
Probability
The probability in your update is not evenly distributed:
- 1/6 or all rows are updated to
'variable1'
.
- But only 5/36 (
(1/6) * (5/6)
) are updated to 'variable2'
.
Judging from the rest of your code, I am going to assume this is an unintended error and you want an equal 1/6 share for each.
Sanitize
You can simplify to:
UPDATE tablex
SET test = CASE trunc(random() * 6)
WHEN float '4' THEN 'variable1'
WHEN float '5' THEN 'variable2'
ELSE test
END;
It's pointless to add 1 to the result. Rather compare to 4
and 5
instead of 5
and 6
(or 3
and 1
- no difference here).
It's a bit cheaper to compare to double precision
(= float
) constants instead of casting the double precision
result of the expression trunc(random() * 6)
to integer
for every row, like it would happen in your original code.
Better, but still very inefficient.
Superior
UPDATE tablex
SET test = CASE WHEN random() >= float '0.5' THEN 'variable1'
ELSE 'variable2' END
WHERE random() >= float '0.6666667';
100 % equivalent (unless you have triggers ON UPDATE
or some exotic setup) but much faster yet, because only rows that actually receive an update are touched. Two thirds of the rows are not touched at all.
Note that the two random()
calls are completely independent.
Use 2 / float '3.0'
instead of float '0.6666667'
if you want to be as precise as possible about the probability of two thirds. But that's really an academic difference.
You might want to LOCK tablex IN ROW EXCLUSIVE MODE
before you run this (in the same transaction) to rule out race conditions with concurrent writes. Details in the manual.
Alternative
If the column test
can already hold one of the target values 'variable1' or 'variable2' (in more than a few cases) this is cheaper, yet:
UPDATE tablex t
SET test = upd.val
FROM (
SELECT id, CASE WHEN random() >= float '0.5' THEN 'variable1'
ELSE 'variable2' END AS val
FROM tablex
WHERE random() >= float '0.6666667'
-- ORDER BY id -- the last two lines only to defend against ...
-- FOR UPDATE -- ... concurrent writes and possible deadlocks
) upd
WHERE t.id = upd.id
AND t.test IS DISTINCT FROM upd.val;
Avoiding more empty updates.
If test
is defined NOT NULL
you can simplify to:
AND t.test <> upd.val;
Compare the last chapter of this related answer: