I have a member table with a million records. The primary key is id. One column in this table is master_location_id which refers to the id column in master_location table which has 41481 records. master_location_id in member table is not populated yet. I am populating it using an anonymous block. It will fetch all records from member table, then loop, take a random record from master_location (select id from master_location order by random() limit 1 ), then update the member table. But it is very inefficient. Is it possible to do this with an SQL statement? PostgreSQL version is 13.2.
Asked
Active
Viewed 606 times
0
-
https://stackoverflow.com/questions/8674718/best-way-to-select-random-rows-postgresql – IODEV Apr 23 '21 at 11:46
1 Answers
1
You can use row_number()
. Assuming there are more master locations than members, you can use:
update members m
set master_location_id = ml.id
from (select m.*, row_number() over (order by id) as seqnum
from members m
) mm join
(select ml.*, row_number() over (order by random()) as seqnum
from master_locations ml
) ml
on ml.seqnum = mm.seqnum
where mm.id = m.id;
Note that updating a million rows will take some time.
Also, this can easily be adjusted if there are fewer locations than members -- using modulo arithmetic for the JOIN
conditions. That would be a tweak on the query.
EDIT:
The tweak for assigning multiple locations is:
update members m
set master_location_id = ml.id
from (select m.*, row_number() over (order by id) as seqnum
from members m
) mm join
(select ml.*,
row_number() over (order by random()) as seqnum,
count(*) over () as cnt
from master_locations ml
) ml
on ml.seqnum = mm.seqnum % ml.cnt
where mm.id = m.id;

Gordon Linoff
- 1,242,037
- 58
- 646
- 786
-
-
I guess set m.master_location_id = ml.id should be set master_location_id = ml.id – Jayadevan Apr 26 '21 at 04:50