Show all rows that have certain columns duplicated

Question

suppose I have following sql table

    objid  firstname lastname active
     1       test      test     0
     2       test      test     1
     3       test1     test1    1
     4       test2     test2    0
     5       test2     test2    0
     6       test3     test3    1

Now, the result I am interested in is as follows:

     objid  firstname lastname active
     1       test      test     0
     2       test      test     1
     4       test2     test2    0
     5       test2     test2    0

How can I achieve this? I have tried the following query,

select firstname,lastname from table
group by firstname,lastname
having count(*) > 1

But this query gives results like

    firstname  lastname
     test        test
     test2       test2

score 61 · Accepted Answer · edited May 23 '17 at 12:09

61

You've found your duplicated records but you're interested in getting all the information attached to them. You need to join your duplicates to your main table to get that information.

select *
  from my_table a
  join ( select firstname, lastname 
           from my_table 
          group by firstname, lastname 
         having count(*) > 1 ) b
    on a.firstname = b.firstname
   and a.lastname = b.lastname

This is the same as an inner join and means that for every record in your sub-query, that found the duplicate records you find everything from your main table that has the same firstseen and lastseen combination.

You can also do this with in, though you should test the difference:

select *
  from my_table a
 where ( firstname, lastname ) in   
       ( select firstname, lastname 
           from my_table 
          group by firstname, lastname 
         having count(*) > 1 )

does this syntax work in sql server? where ( firstname, lastname ) in ( select firstname, lastname from my_table group by firstname, lastname having count(*) > 1 ) – nee21 Sep 29 '17 at 20:28
Have you run it @nee21? What problems did you have? – Ben Sep 30 '17 at 05:25
1

Yes, I get this error: Msg 4145, Level 15, State 1, Line 161 An expression of non-boolean type specified in a context where a condition is expected, near ','. I am not sure if am missing anything. – nee21 Oct 03 '17 at 22:50
Hi @Ben, did u get a chance to check this? – nee21 Jan 19 '18 at 21:22
Have tried the same in W3Schools try it editor, itsn't working – arjun May 10 '18 at 06:12
That must be an issue with w3schools @arjun. Here is the query executing fine: https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=b5a5256340e8142a436f146605861c4f – Ben May 10 '18 at 08:35
Both queries worked in MYSQL. There are different results. The `inner join` version give me both rows, which was what I need, so _good_. The nested query only give me the first / original row. Is there a means to actually get the second row (in `id` order), which are the _duplicate_-s I wan to hit? – will May 14 '18 at 01:01
There's no such thing as a first/original row @will - it's completely random without an explicit ORDER BY. I'm very surprised if the IN only shows one row, the only way these differ is in the way they deal with NULLs. Could you create a dbfiddle which demonstrates your problem? – Ben May 14 '18 at 05:48
I thought in clause only support 1 column. isn't it? – CSK May 06 '19 at 14:57
[`in` supports any number of fields up to the maximum packet size](https://dev.mysql.com/doc/refman/8.0/en/comparison-operators.html#function_in) @CSK – Ben May 06 '19 at 21:59
1

i don't think it works on sql server @Ben. op put the sql server 2008 as label – CSK May 07 '19 at 22:29

Dmytro Shevchenko · Answer 2 · 2012-04-25T21:48:43.050

8

SELECT DISTINCT t1.*
FROM myTable AS t1
INNER JOIN myTable AS t2
  ON t1.firstname = t2.firstname
  AND t1.lastname = t2.lastname
  AND t1.objid <> t2.objid

This will output every row which has a duplicate, basing on firstname and lastname.

edited Apr 25 '12 at 21:48

answered Apr 25 '12 at 21:36

Dmytro Shevchenko

33,431
6
51
67

2

instead of `id` you might have meant `objid` – vol7ron Apr 25 '12 at 21:43
3

Additionally, if you don't distinct the results you'll get duplicates. – Mosty Mostacho Apr 25 '12 at 21:47

score 6 · Answer 3 · answered Apr 25 '12 at 21:48

6

Here's a little more legible way to do Ben's first answer:

WITH duplicates AS (
   select    firstname, lastname
   from      my_table
   group by  firstname, lastname
   having    count(*) > 1
)
SELECT    a.*
FROM      my_table   a
JOIN      duplicates b ON (a.firstname = b.firstname and a.lastname = b.lastname)

answered Apr 25 '12 at 21:48

vol7ron

40,809
21
119
172

1

Wouldn't a simple join (like in my answer) be faster than joining to a grouped temporary table? – Dmytro Shevchenko Apr 25 '12 at 21:50
1

@Shedal: they should be the same thing. A subquery is a temporary table. The above is a way of simplifying reading the SQL. By doing your declaring/defining your subqueries up front, you're able to concentrate on the heart of the SQL that follows – vol7ron Apr 25 '12 at 21:54
@Shedal, it depends. If there's an index on `firstname, lastname` for instance ( I +1'd you though as it's just a different way of doing things ). – Ben Apr 25 '12 at 21:55
@Ben well there should be an index on `firstname, lastname` anyway, for both queries to run fast. – Dmytro Shevchenko Apr 25 '12 at 21:56
@Shedal, the sub-query will only use the index though, the join will have to either use two indexes ( unless it's indexed on `obj_id, fn, ln`) or enter the table. Plus there's no need to do a distinct. Without testing and knowing the selectivity of the columns it's impossible to tell which'll be faster. – Ben Apr 25 '12 at 22:01
@Ben: I think when it comes to your and Shedal's answer, when it comes to speed it depends on a few things (index, columns, etc), but also the number of records. If there are only a few duplicates, your method should be much faster. – vol7ron Apr 26 '12 at 03:19
I should also mention that there are cases where the Postgres query planner does not optimize the WITH queries. So there is a chance for some discrepancies in the performance between the two seemingly equivalent methods. – vol7ron Nov 18 '19 at 04:10

score 6 · Answer 4 · edited Mar 28 '18 at 12:32

6

SELECT user_name,email_ID 
FROM User_Master WHERE 
email_ID 
in (SELECT email_ID 
FROM User_Master GROUP BY 
email_ID HAVING COUNT(*)>1)

edited Mar 28 '18 at 12:32

Parker

7,244
12
70
92

answered Mar 28 '18 at 11:37

Mahesh Bharati

61
1
1

1

While this code may answer the question, providing additional context regarding why and/or how this code answers the question improves its long-term value. – Thomas Flinkow Mar 28 '18 at 11:38

score 1 · Answer 5 · answered Sep 26 '14 at 17:44

1

nice option get all duplicated value from tables

 select * from Employee where Name in (select Name from Employee group by Name having COUNT(*)>1)

answered Sep 26 '14 at 17:44

Jeetendra singh negi

19
1

score 1 · Answer 6 · answered Jun 20 '16 at 09:05

1

This is the easiest way:

SELECT * FROM yourtable a WHERE EXISTS (SELECT * FROM yourtable b WHERE a.firstname = b.firstname AND a.secondname = b.secondname AND a.objid <> b.objid)

answered Jun 20 '16 at 09:05

Riccardo

11
2

score 1 · Answer 7 · edited Sep 18 '16 at 00:25

1

If you want to print all duplicate IDs from the table:

select * from table where id in (select id from table group By id having count(id)>1)

edited Sep 18 '16 at 00:25

Sebastian Lenartowicz

4,695
4
28
39

answered Sep 17 '16 at 04:04

pooja

11
1

score 1 · Answer 8 · answered May 03 '19 at 18:02

I'm surprised that there is no answer using Window function. I just came across this use case and this helped me.

select t.objid, t.firstname, t.lastname, t.active
from
(
select t.*, count(*) over (partition by firstname, lastname) as cnt
from my_table t
) t
where t.cnt > 1;

Fiddle - https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=c0cc3b679df63c4d7d632cbb83a9ef13

The format goes like

select
    tbl.relevantColumns
from
(
    select t.*, count(*) over (partition by key_columns) as cnt
    from desiredTable t
) as tbl
where tbl.cnt > 1;

This format selects whatever columns you require from the table (sometimes all columns) where the count > 1 for the key_columns being used to identify the duplicate rows. key_columns can be any number of columns.

score 0 · Answer 9 · answered Mar 27 '14 at 05:56

0

This answer may not be great one, but I think it is simple to understand.

SELECT * FROM table1 WHERE (firstname, lastname) IN ( SELECT firstname, lastname FROM table1 GROUP BY firstname, lastname having count() > 1);

answered Mar 27 '14 at 05:56

Santhi Kabir

291
1
3
8

Mohammed Safeer · Answer 10 · 2014-10-11T19:22:40.260

This Query returns dupliacates

SELECT * FROM (
  SELECT  a.* 
    FROM table a 
    WHERE (`firstname`,`lastname`) IN (
        SELECT `firstname`,`lastname` FROM table 
        GROUP BY `firstname`,`lastname` HAVING COUNT(*)>1       
        )  
    )z WHERE z.`objid` NOT IN (
        SELECT MIN(`objid`) FROM table 
        GROUP BY `firstname`,`lastname` HAVING COUNT(*)>1
        )

score 0 · Answer 11 · answered Apr 07 '21 at 15:11

0

Please try

WITH cteTemp AS (
  SELECT EmployeeID, JoinDT,
     row_number() OVER(PARTITION BY EmployeeID, JoinDT ORDER BY EmployeeID) AS [RowFound]
  FROM dbo.Employee 
)
SELECT * FROM cteTemp WHERE [RowFound] > 1 ORDER BY JoinDT

answered Apr 07 '21 at 15:11

Md. Nazmul Nadim

207
2
5

Show all rows that have certain columns duplicated

11 Answers11

Further Reading:

Linked

Related