219

I need to run a select statement that returns all rows where the value of a column is not distinct (e.g. EmailAddress).

For example, if the table looks like below:

CustomerName     EmailAddress
Aaron            aaron@gmail.com
Christy          aaron@gmail.com
Jason            jason@gmail.com
Eric             eric@gmail.com
John             aaron@gmail.com

I need the query to return:

Aaron            aaron@gmail.com
Christy          aaron@gmail.com
John             aaron@gmail.com

I have read many posts and tried different queries to no avail. The query that I believe should work is below. Can someone suggest an alternative or tell me what may be wrong with my query?

select EmailAddress, CustomerName from Customers
group by EmailAddress, CustomerName
having COUNT(distinct(EmailAddress)) > 1
ArianJM
  • 676
  • 12
  • 25
Grasshopper
  • 4,717
  • 9
  • 36
  • 62

7 Answers7

356

This is significantly faster than the EXISTS way:

SELECT [EmailAddress], [CustomerName] FROM [Customers] WHERE [EmailAddress] IN
  (SELECT [EmailAddress] FROM [Customers] GROUP BY [EmailAddress] HAVING COUNT(*) > 1)
Serj Sagan
  • 28,927
  • 17
  • 154
  • 183
  • 2
    Hey, I know this answer is 7 years old, but if you're still around would you mind explaining how it works? Solved my problem as well! – Lou Dec 06 '19 at 18:04
  • 7
    Using a `HAVING` here instead of a second `SELECT...WHERE` causes this to be a single query, instead of the second option which executes that second `SELECT...WHERE` call many times. See more here: https://stackoverflow.com/q/9253244/550975 – Serj Sagan Dec 06 '19 at 18:38
  • I get the infamous `[EmailAddress] must appear in the GROUP BY clause or be used in an aggregate function` error. Is the only fix - editing the `sql_mode`? – Volodymyr Bobyr Jul 16 '20 at 16:21
  • Well, in the query above, `[EmailAddress]` IS in the `GROUP BY` – Serj Sagan Feb 02 '23 at 21:25
71

The thing that is incorrect with your query is that you are grouping by email and name, that forms a group of each unique set of email and name combined together and hence

aaron and aaron@gmail.com
christy and aaron@gmail.com
john and aaron@gmail.com

are treated as 3 different groups rather all belonging to 1 single group.

Please use the query as given below :

select emailaddress,customername from customers where emailaddress in
(select emailaddress from customers group by emailaddress having count(*) > 1)
Seasoned
  • 989
  • 1
  • 7
  • 18
  • 29
    I like that you also included an explanation about what is wrong with the original query, unlike the accepted answer. –  Feb 05 '16 at 08:20
24
select CustomerName,count(1) from Customers group by CustomerName having count(1) > 1
Nisar
  • 5,708
  • 17
  • 68
  • 83
  • minor enhancment to show count as "dups": select CustomerName,count(1) as dups from Customers group by CustomerName having count(1) > 1` – DynamicDan May 15 '15 at 11:03
13

How about

SELECT EmailAddress, CustomerName FROM Customers a
WHERE Exists ( SELECT emailAddress FROM customers c WHERE a.customerName != c.customerName AND a.EmailAddress = c.EmailAddress)
Marc
  • 16,170
  • 20
  • 76
  • 119
11

Just for fun, here's another way:

;with counts as (
    select CustomerName, EmailAddress,
      count(*) over (partition by EmailAddress) as num
    from Customers
)
select CustomerName, EmailAddress
from counts
where num > 1
Chad
  • 7,279
  • 2
  • 24
  • 34
  • 2
    +1 for CTE version We shouldn't repeat ourselves in code, why repeat ourselves in SQL if we don't have to anymore. – yzorg Aug 17 '16 at 14:25
  • 2
    I use _count for the count column (over num). I consistently use underscore when columns happen to collide with SQL keywords like _default, _type, _sum, etc. – yzorg Aug 17 '16 at 14:26
  • Loved it, much cleaner, and also the only one that was working accross multiple versions of MariaDB – Corentin Le Fur May 16 '22 at 15:41
4

Rather than using sub queries in where condition which will increase the query time where records are huge.

I would suggest to use Inner Join as a better option to this problem.

Considering the same table this could give the result

SELECT EmailAddress, CustomerName FROM Customers as a 
Inner Join Customers as b on a.CustomerName <> b.CustomerName and a.EmailAddress = b.EmailAddress

For still better results I would suggest you to use CustomerID or any unique field of your table. Duplication of CustomerName is possible.

BillyBarbarIan
  • 123
  • 1
  • 8
0
SELECT        Title, Id
FROM            dbo.TblNews
WHERE        (Title IN
      (SELECT  Title 
FROM dbo.TblNews AS TblNews_1
GROUP BY Title
HAVING (COUNT(*) > 1)))
ORDER BY Title
  • sort in title
mirazimi
  • 814
  • 10
  • 11