I am attempting to update multiple columns on a table with values from another row in the same table:
CREATE TEMP TABLE person (
pid INT
, name VARCHAR(40)
, dob DATE
, younger_sibling_name VARCHAR(40)
, younger_sibling_dob DATE
);
INSERT INTO person VALUES (pid, name, dob)
(1, 'John' , '1980-01-05')
, (2, 'Jimmy', '1975-04-25')
, (3, 'Sarah', '2004-02-10')
, (4, 'Frank', '1934-12-12')
;
The task is to populate younger_sibling_name
and younger_sibling_dob
with the name and birthday of the person that is closest to them in age, but not older or the same age.
I can set the younger sibling dob
easily because this is the value that determines the record to use with a correlated subquery (I think this is an example of that?):
UPDATE person SET younger_sibling_dob = (
SELECT MAX(dob)
FROM person AS sibling
WHERE sibling.dob < person.dob);
I just can't see any way to get the name
?
The real query of this will run over about 1M rows in groups of 100-500 for each MAX selection so performance is a concern.
Edit
After trying many different approaches, I've decided on this one which I think is a good balance of being able to verify the data with the intermediate result, shows the intention of what the logic is, and performs adequately:
WITH sibling AS (
SELECT person.pid, sibling.dob, sibling.name,
row_number() OVER (PARTITION BY person.pid
ORDER BY sibling.dob DESC) AS age_closeness
FROM person
JOIN person AS sibling ON sibling.dob < person.dob
)
UPDATE person
SET younger_sibling_name = sibling.name
,younger_sibling_dob = sibling.dob
FROM sibling
WHERE person.pid = sibling.pid
AND sibling.age_closeness = 1;
SELECT * FROM person ORDER BY dob;