Case insensitive duplicates SQL

Question

So I have a users table where the user.username has many duplicates like:

username and Username and useRnAme
john and John and jOhn

That was a bug and these three records should have been only one.

I'm trying to come up with a SQL query that lists all of these cases ordered by their creation date, so ideally the result should be something like this:

username jan01
useRnAme jan02
Username jan03
john     feb01 
John     feb02
jOhn     feb03

Any suggestions will be much appreciated

@hdx: Your question is tagged `mysql` and `postgresql`. Are you using both? — Peter Lang, Apr 22 '10 at 20:12
@hdx: Are you actually storing the dates in that format and not in a date column? — Mark Byers, Apr 22 '10 at 20:13
@Peter Lang, in fact any sql like language would do, I can port it. I'm using postgresql. — hdx, Apr 22 '10 at 20:15
@Mark Byers it is in date format that was just a basic example — hdx, Apr 22 '10 at 20:16
@hdx: It might be better to make different queries for each database. Trying to write queries that work in all databases is usually a bad idea. — Mark Byers, Apr 22 '10 at 20:24

Larry Lustig · Accepted Answer · 2010-04-22T20:33:10.897

46

Leaving aside the issue of case sensitivity for a moment, the basic strategy is:

 SELECT username, create_date FROM your_table
     WHERE username IN 
     (SELECT username FROM your_table GROUP BY username HAVING COUNT(*) > 1)
 ORDER BY username, create_date

Many RDBMSes (including MySQL assuming that you are using CHAR or VARCHAR for the username column), perform case-insensitive searching by default. For those databases, the above solution will work. To solve the case sensitivity issue for other products , wrap all except the first occurrence of username in the uppercase conversion function specific to your RDBMS:

 SELECT username, create_date FROM your_table
     WHERE UPPER(username) IN 
     (SELECT UPPER(username) FROM your_table GROUP BY UPPER(username) HAVING COUNT(*) > 1)
 ORDER BY username, create_date

edited Apr 22 '10 at 20:33

answered Apr 22 '10 at 20:15

Larry Lustig

49,320
14
110
160

1

If it's for MYSQL the UPPER is not needed and might even make the query unnecessarily slow. – Mark Byers Apr 22 '10 at 20:23
Yes, that's true (and true for various other RDBMSes as well). I'll modify the answer to reflect that. – Larry Lustig Apr 22 '10 at 20:31
is there a way to make sure the dates are in ascending order for each group of dups? – hdx Apr 22 '10 at 20:45
Did you include the ORDER BY clause, with the columns appropriately changed for your database? – Larry Lustig Apr 22 '10 at 20:57
Ok so I found the problem... we need to have "UPPER(username), create_date" at very the end. Thx for the help! – hdx Apr 22 '10 at 22:32
Not sure I understand why that's so, but I'm glad you found the solution. – Larry Lustig Apr 23 '10 at 00:12

score 2 · Answer 2 · answered Apr 22 '10 at 20:18

2

Try something like these

SELECT UserName, CreatedDate
FROM User
WHERE LOWER(TRIM(UserName)) IN 
(
SELECT LOWER(TRIM(UserName))
FROM User
GROUP BY LOWER(TRIM(UserName))
HAVING count(*) > 1
)

answered Apr 22 '10 at 20:18

Christoph

4,251
3
24
38

Opps, I see Larry posted the same thing first – Christoph Apr 22 '10 at 20:20

score 0 · Answer 3 · answered Apr 22 '10 at 20:12

0

Use ToLower() or equivalent function in your SELECT, and order by that column.

answered Apr 22 '10 at 20:12

3Dave

28,657
18
88
151

That will include usernames that do not suffer from the multi-entry problem. – Larry Lustig Apr 22 '10 at 20:16

score 0 · Answer 4 · answered Apr 22 '10 at 20:16

In MySQL, a case-sensitive compare is done using a binary collation. So you could join the table on itself, looking for rows where the case sensitive compare is different from the case insensitive compare:

select *
from YourTable t1
inner join YourTable t2 
on t1.name <> t2.name collate latin1_bin
and t1.name = t2.name

score 0 · Answer 5 · edited May 23 '17 at 06:49

0

SELECT UserName, CreatedDate
FROM YourTable 
WHERE UserName COLLATE UTF8_BIN != LOWER(UserName COLLATE UTF8_BIN)
GROUP BY UserName, CreatedDate
HAVING COUNT(*) > 1

edited May 23 '17 at 06:49

cske

2,233
4
26
24

answered May 23 '17 at 03:53

ShadowTK

101
2

**From review queue**: May I request you to please add some context around your source-code. Code-only answers are difficult to understand. It will help the asker and future readers both if you can add more information in your post. – RBT May 23 '17 at 08:00

score 0 · Answer 6 · answered Feb 07 '22 at 14:53

so this is what i came up with. this was written against a postgres db but should work fine still against other sql engine.

select * from user u join user u2
on upper(u.email)=upper(u2.email) where u.id != u2.id
order by u.email;

so the query assume that the email are duplicate but the ids are not so it is looking to pull records with a duplicate email (case insensitive) but with unique id

Case insensitive duplicates SQL

6 Answers6