0

How can I link duplicate records in PostrgreSQL? I've found them with:

SELECT * FROM (
  SELECT id, import_id, name,
  ROW_NUMBER() OVER(PARTITION BY address ORDER BY name asc) AS Row
  FROM companies
) dups
where 
dups.Row > 1 ORDER BY dups.name;

See sample code and demo at http://sqlfiddle.com/#!15/af016/7/1

I want to add a column to companies called linked_id, that will be set to the import_id of the first of each set of duplicate records.

Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
  • It might be a good idea not to use "Row" as a column alias. It's confusing, and it's also a keyword in some contexts. BTW, questions like this are *much* easier to answer if you post some sample data / schema as `CREATE TABLE` and `INSERT` statements. SQLFiddle.com can be handy for this, and has a text-to-sql conversion tool. – Craig Ringer Dec 15 '13 at 17:00
  • "Row" came from the example I found at http://stackoverflow.com/questions/14471179 – Circuitsoft Dec 16 '13 at 00:43

2 Answers2

1

Try:

UPDATE companies c
SET import_id = q.import_id
FROM (
  SELECT id, 
  FIRST_VALUE(import_id) 
      OVER(PARTITION BY name, address ORDER BY name asc) AS import_id,
  ROW_NUMBER() 
      OVER(PARTITION BY name, address ORDER BY name asc) AS Rn
  FROM companies
) q
WHERE c.id = q.id AND q.rn > 1
;

Demo: http://sqlfiddle.com/#!15/af016/10

krokodilko
  • 35,300
  • 7
  • 55
  • 79
1

This will set parent_id to the import_id of the first company to match.

UPDATE companies
SET parent_id=rs.parent_id FROM
(SELECT id, first_value(import_id)
 OVER (PARTITION BY address ORDER BY name) as parent_id
 FROM companies
) AS rs
WHERE rs.id=companies.id;