How can I update a table with random ids from another table in PostgreSQL

Question

I have a member table with a million records. The primary key is id. One column in this table is master_location_id which refers to the id column in master_location table which has 41481 records. master_location_id in member table is not populated yet. I am populating it using an anonymous block. It will fetch all records from member table, then loop, take a random record from master_location (select id from master_location order by random() limit 1 ), then update the member table. But it is very inefficient. Is it possible to do this with an SQL statement? PostgreSQL version is 13.2.

https://stackoverflow.com/questions/8674718/best-way-to-select-random-rows-postgresql — IODEV, Apr 23 '21 at 11:46

Gordon Linoff · Accepted Answer · 2021-04-26T12:48:31.000

You can use row_number(). Assuming there are more master locations than members, you can use:

update members m
    set master_location_id = ml.id
    from (select m.*, row_number() over (order by id) as seqnum
          from members m
         ) mm join
         (select ml.*, row_number() over (order by random()) as seqnum
          from master_locations ml
         ) ml
         on ml.seqnum = mm.seqnum
    where mm.id = m.id;

Note that updating a million rows will take some time.

Also, this can easily be adjusted if there are fewer locations than members -- using modulo arithmetic for the JOIN conditions. That would be a tweak on the query.

EDIT:

The tweak for assigning multiple locations is:

update members m
    set master_location_id = ml.id
    from (select m.*, row_number() over (order by id) as seqnum
          from members m
         ) mm join
         (select ml.*,
                 row_number() over (order by random()) as seqnum,
                 count(*) over () as cnt
          from master_locations ml
         ) ml
         on ml.seqnum = mm.seqnum % ml.cnt
    where mm.id = m.id;

There are 1 million members, and 41481 locations (US zip codes) — Jayadevan, Apr 23 '21 at 12:07
I guess set m.master_location_id = ml.id should be set master_location_id = ml.id — Jayadevan, Apr 26 '21 at 04:50

How can I update a table with random ids from another table in PostgreSQL

1 Answers1