finding duplicates rows for specified columns

Question

I have data like below where i need to find the ids which are having same values(count of values should also same). Here id 1 and 3 are having same values with different id so i want these 2 ids as out put. Id is 4 is also having same links but it is also having different value 333 so i don't want that.

id  value
1   111
1   222

2   222
2   333

3   111
3   222

4   111
4   222
4   333

5       111
5       444

I tried using exists query but it is also returning 5 which is wrong.I am using oracle 11G schema.

Output expecting is something like

giving some number to the same ids to identify the duplicates.

so for

1 111 1 1 222 1 3 111 1 3 111 1

Can you show the output you expect? – Arion Jun 16 '14 at 06:49 — Arion, Jun 16 '14 at 06:49

score 0 · Accepted Answer · answered Jun 16 '14 at 07:12

You need SET operations for this. One idea for testing set equality is this: A MINUS B EQUALS B MINUS A EQUALS EMPTY SET.

Here is how you can do it with PL/SQL.

DECLARE
  v_count_1 number;
  v_count_2 number;
BEGIN
  FOR r IN (SELECT DISTINCT id FROM my_table) LOOP
    FOR r_inner in (SELECT DISTINCT id FROM my_table WHERE id > r.id) LOOP
      SELECT count(*)
      INTO v_count_1
      FROM (SELECT value FROM my_table WHERE id = r.id
            MINUS
            SELECT value FROM my_table WHERE id = r_inner.id);

      SELECT count(*)
      INTO v_count_2
      FROM (SELECT value FROM my_table WHERE id = r_inner.id
            MINUS
            SELECT value FROM my_table WHERE id = r.id);

      IF v_count_1 = 0 AND v_count_2 = 0 THEN
        dbms_output.put_line('duplicate IDs: '||r.id||' '||r_inner.id);
      END IF;
    END LOOP;
  END LOOP;
END;
/

score 0 · Answer 2 · answered Jun 16 '14 at 09:04

If only the IDs that match all the values of each other is the part of interest, one way to go is to check how many value each ID has then match those with the same count and check how many value are matched

With R AS (
  SELECT id, Count(1) Dim
  FROM   Table1
  GROUP BY id
)
SELECT r1.id id_1, r2.id id_2
FROM   R r1
       INNER JOIN R r2 ON r1.dim = r2.dim
       INNER JOIN Table1 t1 ON r1.id = t1.id
       INNER JOIN Table1 t2 ON r2.id = t2.id AND t1.value = t2.value 
              AND t2.id > t1.id
GROUP BY r1.id, r2.id
HAVING COUNT(1) = MAX(r1.dim);

the t2.id > t1.id condition is to avoid the pair to be repeated

To get a resultset more similar to the one in the question the previous query can be used as a base to JOIN again with the table

With R AS (
  SELECT id, Count(1) Dim
  FROM   Table1
  GROUP BY id
), M AS (
  SELECT r1.id id_1, r2.id id_2
  FROM   R r1
         INNER JOIN R r2 ON r1.dim = r2.dim
         INNER JOIN Table1 t1 ON r1.id = t1.id
         INNER JOIN Table1 t2 ON r2.id = t2.id AND t1.value = t2.value 
                AND t2.id > t1.id
  GROUP BY r1.id, r2.id
  HAVING COUNT(1) = MAX(r1.dim)
)
SELECT t1.id id_1, t1.value value_1, t1.id base_value
     , t2.id id_2, t2.value value_2, t1.id base_value_2
FROM   M
       INNER JOIN Table1 t1 ON m.id_1 = t1.id
       INNER JOIN Table1 t2 ON m.id_2 = t2.id AND t1.value = t2.value

SQLFiddle demo with both the queries.

Even if the demo is in SQLServer 2012, the queries were written using only standard command.

If the OP uses MySQL the part within WITH need to be converted to subqueries in the FROM clause.

finding duplicates rows for specified columns

2 Answers2