-2

I am trying to find the IDs that have duplicate emails due to case sensitivity.

For below table

ID      Email
101     example@gmail.com
101     EXAMPLE@gmail.com
102     email@gmail.com
102     email@gmail.com
103     la@gmail.com
103     sf@yahoo.com

The output should give below

ID      Email 
101     example@gmail.com
101     EXAMPLE@gmail.com

Only ID 101 and its email addresses should be displayed because that's the only ID that has duplicate emails due to case sensitivity.

Thank you in advance!

EZ72
  • 61
  • 7
  • Also see [delete duplicates](https://stackoverflow.com/questions/3311903/remove-duplicate-rows-in-mysql) and consider changing the email column to a case insensitive collation with a unique key to prevent the addition of duplicates. – danblack Jul 08 '22 at 00:04

3 Answers3

3

Use a self-join that performs both case-sensitive and case-insensitive comparisons.

(note - edited for Snowflake)


    SELECT a.id, a.email
    FROM yourTable a
    JOIN yourTable b 
        ON a.id = b.id 
        AND collate(a.email,'en-ci') = collate(b.email,'en-ci') 
        AND a.email != b.email;

Jim Demitriou
  • 593
  • 4
  • 8
Barmar
  • 741,623
  • 53
  • 500
  • 612
  • thank you for your help. I am trying to get this done in the Snowflake database. I do not know the differences between utf8 formats. However, if I use both conditions as you provided, I am getting an empty table. For my case, should I use utf8mb4_general_ci or utf8mb4_bin? – EZ72 Jul 08 '22 at 10:47
  • @EZ72 If you are uysing snowflake, why did you tag MySQL in the first place? Close this question and ask a new one. – The Impaler Jul 08 '22 at 12:52
  • @TheImpaler it does not let me delete the question as people started to put the effort into it. I changed the tags. Sorry for the inconvenience. – EZ72 Jul 08 '22 at 14:21
  • @Barmar I just added two rows for example which is going to represent my population better. Thank you! – EZ72 Jul 08 '22 at 14:46
  • 1
    I'm not familiar with Snowflake, I don't know how to translate this. But the basic idea should be the same -- use a self-join where the emails match case-insensitively, but not case-sensitively. – Barmar Jul 08 '22 at 15:01
  • A minor adjustment to the code would make this approach Snowflake compatible - you can user collate(a.email, 'en-ci') = collate(b.email, 'en-ci') for the case insensitive comparison. You don't need a collate for case sensitivity, that's the default (assuming it wasn't overridden in the table definition). – Jim Demitriou Jul 09 '22 at 04:22
1

You can do:

select * 
from t
where lower(email) in (
  select lower(email)
  from t
  group by lower(email)
  having min(email) <> max(email)
)
The Impaler
  • 45,731
  • 9
  • 39
  • 76
0

The self-join suggestion was helpful. This should work!

SELECT
      distinct a.id
     ,a.email
FROM table1 AS a
JOIN table1 AS b 
    ON a.id = b.id 
WHERE a.email != b.email
     AND lower(a.email) = lower(b.email)
EZ72
  • 61
  • 7