1

I have the following query to basically find all duplicates in my username column:

SELECT  `username` 
FROM  `instagram_user` 
GROUP BY  `username` 
HAVING COUNT( * ) >1

How do I remove all the duplicates, such that it will only leave me with one unique username in the table? I don't care which entity it is that is persisted or removed, as long as there's one unique username in the table.

adit
  • 32,574
  • 72
  • 229
  • 373
  • 1
    What engine are you using for this table (MyISAM, InnoDB, ...)? Do you have some `id` column? Better yet post your table schema – peterm Oct 02 '13 at 01:46
  • 1
    see also http://stackoverflow.com/questions/3311903/remove-duplicate-rows-in-mysql?rq=1 – Thilo Oct 02 '13 at 01:49
  • @peterm yes I do have an id with each username, and it's only id and username.. using MyISAM – adit Oct 02 '13 at 01:58

3 Answers3

2

If you don't care what record to choose then just add a unique constraint while using IGNORE

ALTER IGNORE TABLE instagram_user ADD UNIQUE (username);

Here is SQLFiddle demo

and MySQL will do the job for you. You want to have that unique constraint anyway in order to keep your table out of duplicates in the future.

or alternatively you can do

DELETE t
  FROM instagram_user t JOIN
(
  SELECT username, MAX(id) id
    FROM instagram_user
   GROUP BY username
  HAVING COUNT(*) > 1
) q 
    ON t.username = q.username
   AND t.id <> q.id

This one will leave only a row with max id for rows that have duplicate usernames.

Here is SQLFiddle demo

peterm
  • 91,357
  • 15
  • 148
  • 157
0

Not sure this is for SQL server, you can try a similar code in mysql.

;With CteUsers AS(

SELECT *,ROW_NUMBER() OVER (PARTITION BY username Order by username) AS ROWID
FROM(

SELECT PkId, `username` 
FROM  `instagram_user`
)tbltemp)
SELECT * FROM CteUsers; 

This will result as follow

PkId username RowId
1     xx        1
2     xx        2
....

then delete where RowId > 1

;With CteUsers AS(

    SELECT *,ROW_NUMBER() OVER (PARTITION BY username Order by username) AS ROWID
    FROM(

    SELECT PkId, `username` 
    FROM  `instagram_user`
    )tbltemp)
  DELETE instagram_user WHERE PkId iN (SELECT PkId FROM CteUsers  WHERE ROWID > 1);
sudhansu63
  • 6,025
  • 4
  • 39
  • 52
0

This will give you the duplicates (i.e. the ones you need to delete) ...

select a.id, a.username from instagram_user a, instagram_user b
where a.username = b.username and a.id <> b.id
  and b.id = (select min(id) from instagram_user where username = a.username)

so the DELETE would be something like ...

delete from instagram_user where id in
 (select a.id from instagram_user a, instagram_user b
    where a.username = b.username and a.id <> b.id
      and b.id = (select min(c.id) from instagram_user c 
                  where c.username = a.username))
xagyg
  • 9,562
  • 2
  • 32
  • 29