18

So I have a users table where the user.username has many duplicates like:

username and Username and useRnAme
john and John and jOhn

That was a bug and these three records should have been only one.

I'm trying to come up with a SQL query that lists all of these cases ordered by their creation date, so ideally the result should be something like this:

username jan01
useRnAme jan02
Username jan03
john     feb01 
John     feb02
jOhn     feb03

Any suggestions will be much appreciated

Peter Lang
  • 54,264
  • 27
  • 148
  • 161
hdx
  • 4,198
  • 7
  • 26
  • 33

6 Answers6

46

Leaving aside the issue of case sensitivity for a moment, the basic strategy is:

 SELECT username, create_date FROM your_table
     WHERE username IN 
     (SELECT username FROM your_table GROUP BY username HAVING COUNT(*) > 1)
 ORDER BY username, create_date

Many RDBMSes (including MySQL assuming that you are using CHAR or VARCHAR for the username column), perform case-insensitive searching by default. For those databases, the above solution will work. To solve the case sensitivity issue for other products , wrap all except the first occurrence of username in the uppercase conversion function specific to your RDBMS:

 SELECT username, create_date FROM your_table
     WHERE UPPER(username) IN 
     (SELECT UPPER(username) FROM your_table GROUP BY UPPER(username) HAVING COUNT(*) > 1)
 ORDER BY username, create_date
Larry Lustig
  • 49,320
  • 14
  • 110
  • 160
2

Try something like these

SELECT UserName, CreatedDate
FROM User
WHERE LOWER(TRIM(UserName)) IN 
(
SELECT LOWER(TRIM(UserName))
FROM User
GROUP BY LOWER(TRIM(UserName))
HAVING count(*) > 1
)
Christoph
  • 4,251
  • 3
  • 24
  • 38
0

Use ToLower() or equivalent function in your SELECT, and order by that column.

3Dave
  • 28,657
  • 18
  • 88
  • 151
0

In MySQL, a case-sensitive compare is done using a binary collation. So you could join the table on itself, looking for rows where the case sensitive compare is different from the case insensitive compare:

select *
from YourTable t1
inner join YourTable t2 
on t1.name <> t2.name collate latin1_bin
and t1.name = t2.name
Andomar
  • 232,371
  • 49
  • 380
  • 404
0
SELECT UserName, CreatedDate
FROM YourTable 
WHERE UserName COLLATE UTF8_BIN != LOWER(UserName COLLATE UTF8_BIN)
GROUP BY UserName, CreatedDate
HAVING COUNT(*) > 1
cske
  • 2,233
  • 4
  • 26
  • 24
ShadowTK
  • 101
  • 2
  • **From review queue**: May I request you to please add some context around your source-code. Code-only answers are difficult to understand. It will help the asker and future readers both if you can add more information in your post. – RBT May 23 '17 at 08:00
0

so this is what i came up with. this was written against a postgres db but should work fine still against other sql engine.

select * from user u join user u2
on upper(u.email)=upper(u2.email) where u.id != u2.id
order by u.email;

so the query assume that the email are duplicate but the ids are not so it is looking to pull records with a duplicate email (case insensitive) but with unique id

Badmous
  • 95
  • 1
  • 5