7

I have searched for Find duplicate entries in a column and Oracle: find duplicate rows in select query, but couldn't seem to get any answer...

I have some data that looks like this

columnA    columnB    columnC
1111111    emailID1   true
1111111    emailID2   false
1111222    emailID3   true
1111339    emailID4   true
2384398    emailID5   true

I would like to only display these column that has the same values in columnA but can be different in columnB and/or C:

columnA    columnB    columnC
1111111    emailID1   true
1111111    emailID2   false

Using the having >1 doesn't really seem to capture this, any ideas? Thanks.

Community
  • 1
  • 1
prog rice bowl
  • 788
  • 3
  • 19
  • 36

7 Answers7

7

Using having count(*) > 1 is just one half of the puzzle. The other half is getting the corresponding rows.

You can do it like this:

SELECT *
FROM MyTable
WHERE ColumnA IN (
    SELECT ColumnA FROM MyTable GROUP BY ColumnA HAVING COUNT(*) > 1
)
Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
5

Try this:

SELECT t.*
FROM (SELECT ColumnA FROM MyTable GROUP BY ColumnA HAVING COUNT(*) > 1) dups
JOIN MyTable t ON t.ColumnA = dups.ColumnA

This will scale well too, as long as an index is on ColumnA:

create index MyTable_ColumnA on MyTable(ColumnA);

such an index would be used for both the main query and the inner query, giving you very good performance.

Bohemian
  • 412,405
  • 93
  • 575
  • 722
  • 1
    +1 Although I like the looks of an `IN` a little better, I 100% agree that a `JOIN` makes you less dependent on the optimizer. – Sergey Kalinichenko Aug 29 '13 at 03:48
  • 1
    Of course not - in this case the suggestion is to join the one table to itself. The inner query identifies the duplicate rows, the outer query gathers the other columns that you wanted in the result. – Jeffrey Kemp Aug 29 '13 at 04:28
  • 1
    @progricebowl if you execute this query you will find it gives you the result you seek. The join is being made between the list of duplicate ColumnA values back to the table itself to give you the rows. I firmly believe this query will give you the best possible performance of any query. – Bohemian Aug 29 '13 at 05:34
  • @Bohemian I tried this query and others but I always get "not a group by expression" ? – David Trevor Aug 12 '16 at 08:25
  • @taclight what database and version are you using? – Bohemian Aug 12 '16 at 09:29
  • @Bohemian 12.1.0.2 – David Trevor Aug 12 '16 at 11:28
  • @taclight what us your full query? Omit irrelevant bits if it doesn't fit in a comment – Bohemian Aug 12 '16 at 13:56
  • @Bohemian yours as well as `select column_name, count(column_name) from table group by column_name having count (column_name) > 1;` it's working on the ID columns but not on the name, why? – David Trevor Aug 15 '16 at 06:15
1

I usually like to avoid hitting the table more than once in the query - this will work well even without an index - doing only one scan over the table:

SELECT columnA, columnB, columnC
FROM (SELECT mytable.*
            ,COUNT(*) OVER (PARTITION BY columnA) countA
      FROM mytable)
WHERE countA > 1
Jeffrey Kemp
  • 59,135
  • 14
  • 106
  • 158
0
SELECT T.columnA, S.columnB, S.columnC FROM
(
SELECT columnA FROM someTable 
GROUP BY columnA
HAVING COUNT(*) > 1
) T INNER JOIN someTable S ON T.columnA = S.columnA 
realnumber3012
  • 1,062
  • 6
  • 10
0

You can do this using analytic functions. Find the min and max and return the rows where they are different:

select columnA, columnB, columnC
from (select t.*,  min(t.columnC) over (partition by columnA, columnB) as minC,
             max(t.columnC) over (partition by columnA, columnB) as maxC
      from t
     ) t
where minC <> maxC;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
0

This thread might be old but it is worth to update everyone the better/efficient solution to find duplicate records. You can use partition to find duplicate records on matching columns (as many column as you need without using inner join).

SELECT  *
FROM    (
        SELECT  t.*, ROW_NUMBER() OVER (PARTITION BY column1, column2 ORDER BY your_key_column) AS duplicate_count
        FROM    yourtable t
        )
WHERE   duplicate_count > 1 --get duplicate records for the matching value in column1 and column2

See original answer from @Quassnoi at here. Thanks for him, a very clever solution by using partition.

Community
  • 1
  • 1
Estin Chin
  • 781
  • 7
  • 14
0

If you are looking for entries with unique id's in your database where multiple keys in a column can occur, then a simple way finding them is to create two tables like exlplained below:

Here: TICKETID is a primary key, TKTNUMBER can occur multiple times.

CREATE TABLE TEMP
(
   TICKETID    FLOAT,
   TKTNUMBER   FLOAT
);

CREATE TABLE TEMP2
(
   TKTNUMBER   FLOAT,
   COUNTER     INTEGER
);

Put in all the TICKETID's and TKTNUMBER's by looking only on the TKTNUMBERS with COUNT(TKTNUMBER)>1:

INSERT INTO TEMP
   SELECT 
       TICKETID, 
       TKTNUMBER
   FROM YOUR_TABLE
   WHERE TKTNUMBER IN (  
            SELECT TKTNUMBER
            FROM YOUR_TABLE
            HAVING COUNT (TKTNUMBER) > 1
            GROUP BY TKTNUMBER);

Finally, to see the counter, put in the TKTNUMBER and COUNT the same way as above:

INSERT INTO TEMP2
    SELECT 
        TKTNUMBER, 
        COUNT (TKTNUMBER) AS COUNTER
    FROM YOUR_TABLE
    HAVING COUNT (TKTNUMBER) > 1
    GROUP BY TKTNUMBER
    ORDER BY 2 DESC

You can select as follows (by joining the two tables on tktnumber):

SELECT 
    T1.TICKETID,
    T1.TKTNUMBER,
    T2.COUNTER
FROM 
    TEMP T1 INNER JOIN 
    TEMP2 T2 ON 
        T2.TKTNUMBER = T1.TKTNUMBER
ORDER BY T2.COUNTER DESC
V. Wolf
  • 123
  • 1
  • 8