Link duplicate records in PostgreSQL

Question

How can I link duplicate records in PostrgreSQL? I've found them with:

SELECT * FROM (
  SELECT id, import_id, name,
  ROW_NUMBER() OVER(PARTITION BY address ORDER BY name asc) AS Row
  FROM companies
) dups
where 
dups.Row > 1 ORDER BY dups.name;

See sample code and demo at http://sqlfiddle.com/#!15/af016/7/1

I want to add a column to companies called linked_id, that will be set to the import_id of the first of each set of duplicate records.

It might be a good idea not to use "Row" as a column alias. It's confusing, and it's also a keyword in some contexts. BTW, questions like this are *much* easier to answer if you post some sample data / schema as `CREATE TABLE` and `INSERT` statements. SQLFiddle.com can be handy for this, and has a text-to-sql conversion tool. — Craig Ringer, Dec 15 '13 at 17:00
"Row" came from the example I found at http://stackoverflow.com/questions/14471179 — Circuitsoft, Dec 16 '13 at 00:43

krokodilko · Accepted Answer · 2013-12-16T00:44:48.710

1

Try:

UPDATE companies c
SET import_id = q.import_id
FROM (
  SELECT id, 
  FIRST_VALUE(import_id) 
      OVER(PARTITION BY name, address ORDER BY name asc) AS import_id,
  ROW_NUMBER() 
      OVER(PARTITION BY name, address ORDER BY name asc) AS Rn
  FROM companies
) q
WHERE c.id = q.id AND q.rn > 1
;

Demo: http://sqlfiddle.com/#!15/af016/10

edited Dec 16 '13 at 00:44

answered Dec 16 '13 at 00:38

krokodilko

35,300
7
55
79

I found my answer just as you posted your. Thanks! – Circuitsoft Dec 16 '13 at 01:13

score 1 · Answer 2 · answered Dec 16 '13 at 01:12

This will set parent_id to the import_id of the first company to match.

UPDATE companies
SET parent_id=rs.parent_id FROM
(SELECT id, first_value(import_id)
 OVER (PARTITION BY address ORDER BY name) as parent_id
 FROM companies
) AS rs
WHERE rs.id=companies.id;

Link duplicate records in PostgreSQL

2 Answers2