Find duplicates comparing 2 fields in PostgreSQL

Question

I've a table with the following data

id  parent_id   ascii_name  lang
1   123         Foo         en
2   123         Foo         fi
3   456         Bar         it
4   345         Foo         fr

I want to select all the records that have the same parent_id and ascii_name, basically I want this:

id  parent_id   ascii_name  lang
1   123         Foo         en
2   123         Foo         fi

Right now I was able to select the records having only the same ascii_name:

id  parent_id   ascii_name  lang
1   123         Foo         en
2   123         Foo         fi
4   345         Foo         fr

using the query:

SELECT * FROM table WHERE ascii_name in 
(SELECT ascii_name FROM table GROUP By ascii_name
 HAVING "count"(ascii_name) > 1)

I don't know how to put the parent_id into the equation.

Update

I found the right query using both @jakub and @mucio answers:

SELECT * FROM geo_nodes_copy WHERE (parent_id,ascii_name) in 
(SELECT parent_id, ascii_name 
 FROM geo_nodes_copy 
 GROUP By parent_id, ascii_name 
 HAVING count (1) > 1)

Now, the only problem is, maybe, the query speed.

Much like this: http://stackoverflow.com/questions/54418/how-do-i-or-can-i-select-distinct-on-multiple-columns/12632129#12632129 — Erwin Brandstetter, Apr 16 '15 at 13:33

score 1 · Answer 1 · answered Apr 16 '15 at 10:13

1

Use the following query as subquery

   SELECT parent_id, 
          ascii_name 
     FROM table 
 GROUP By parent_id, 
          ascii_name 
   HAVING count (1) > 1

This will return you all the couple parent_id/ascii_name with multiple rows.

answered Apr 16 '15 at 10:13

mucio

7,014
1
21
33

Jakub Kania · Answer 2 · 2015-04-16T13:00:48.243

1

Well, since it's pg you can use a row construct:

SELECT * FROM table WHERE (ascii_name,parent_id) in 
(SELECT ascii_name, parent_id FROM table GROUP By ascii_name, parent_id HAVING Count(ascii_name) > 1)

edited Apr 16 '15 at 13:00

answered Apr 16 '15 at 10:14

Jakub Kania

15,665
2
37
47

1

this will not run because you need to add `parent_id` in the _GROUP BY_ clause – Vivek S. Apr 16 '15 at 11:31
@vivek yeah, that's what I used, the problem now is performance – Fed03 Apr 16 '15 at 13:06
1

@Fed03 [This answer](http://stackoverflow.com/questions/29671548/find-duplicates-comparing-2-fields-in-postgresql/29672388#29672388) can be use for better performance – Vivek S. Apr 17 '15 at 04:34

Gordon Linoff · Answer 3 · 2015-04-17T23:04:54.057

1

Use window functions:

select t.*
from (select t.*, count(*) over (partition by ascii_name, parent_id) as cnt
      from table t
     ) t
where cnt >= 2;

Under some circumstances, it might be a bit faster to use exists:

select t.*
from table t
where exists (select 1
              from table t2
              where t2.ascii_name = t.ascii_name and
                    t2.parent_id = t.parent_id and
                    t2.id <> t.id
             );

For performance, include an index on table(ascii_name, parent_id, id).

edited Apr 17 '15 at 23:04

answered Apr 16 '15 at 10:46

Gordon Linoff

1,242,037
58
646
786

the first solution doesn't work, the second yes! For the index, do you mena a combined index of the 3 columns? – Fed03 Apr 17 '15 at 16:01
@vivek . . . There was an off-by-one error in the first answer. – Gordon Linoff Apr 17 '15 at 23:05

score 0 · Answer 4 · answered Apr 16 '15 at 10:13

0

Assuming that a parentid will always share the same asciiname

SELECT a.* 
FROM table a
WHERE a.ascii_name =
(SELECT b.ascii_name 
 FROM table b
 WHERE a.parent_id = b.parent_id)

answered Apr 16 '15 at 10:13

Matt

14,906
27
99
149

Find duplicates comparing 2 fields in PostgreSQL

Update

4 Answers4