In a table with > 100k rows, how can I efficiently shuffle the values of a specific column?
Table definition:
CREATE TABLE person
(
id integer NOT NULL,
first_name character varying,
last_name character varying,
CONSTRAINT person_pkey PRIMARY KEY (id)
)
In order to anonymize data, I have to shuffle the values of the 'first_name' column in place (I'm not allowed to create a new table).
My try:
with
first_names as (
select row_number() over (order by random()),
first_name as new_first_name
from person
),
ids as (
select row_number() over (order by random()),
id as ref_id
from person
)
update person
set first_name = new_first_name
from first_names, ids
where id = ref_id;
It takes hours to complete.
Is there an efficient way to do it?