0

I have a member table with a million records. The primary key is id. One column in this table is master_location_id which refers to the id column in master_location table which has 41481 records. master_location_id in member table is not populated yet. I am populating it using an anonymous block. It will fetch all records from member table, then loop, take a random record from master_location (select id from master_location order by random() limit 1 ), then update the member table. But it is very inefficient. Is it possible to do this with an SQL statement? PostgreSQL version is 13.2.

Jayadevan
  • 1,306
  • 2
  • 12
  • 33

1 Answers1

1

You can use row_number(). Assuming there are more master locations than members, you can use:

update members m
    set master_location_id = ml.id
    from (select m.*, row_number() over (order by id) as seqnum
          from members m
         ) mm join
         (select ml.*, row_number() over (order by random()) as seqnum
          from master_locations ml
         ) ml
         on ml.seqnum = mm.seqnum
    where mm.id = m.id;

Note that updating a million rows will take some time.

Also, this can easily be adjusted if there are fewer locations than members -- using modulo arithmetic for the JOIN conditions. That would be a tweak on the query.

EDIT:

The tweak for assigning multiple locations is:

update members m
    set master_location_id = ml.id
    from (select m.*, row_number() over (order by id) as seqnum
          from members m
         ) mm join
         (select ml.*,
                 row_number() over (order by random()) as seqnum,
                 count(*) over () as cnt
          from master_locations ml
         ) ml
         on ml.seqnum = mm.seqnum % ml.cnt
    where mm.id = m.id;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786